Search

Saturday, August 13, 2016

How to Troubleshoot Connectivity Issue with 11gR2 RAC SCAN listerner , VIP



How to Troubleshoot Connectivity Issue with 11gR2 RAC  SCAN  listerner  , VIP

I installed the Oracle 11g RAC successfully, But when tried connecting from remote client via SCAN, it used to raise ORA-12537. This is the most common issues that the happens with SCAN listener. And there are few common mistakes that generally cause this issue..
So, here is how we can troubleshoot the connectivity issues with SCAN, and cases out the possibilities.

1)  Check if Local_listener and remote_listener parameter are set properly on all nodes.

2)  The very common issue is with permissions. SCAN will always be created under grid user (Grid cluserware installation user). Oracle will also create one local listener “LISTENER” during grid infrastructure installation. But if that is not present then always make sure that you create a local listener with grid user. This is required to handover the connection between remote and local listener. 

3)  Also “oracle” executable should have given to oracle and grid user i.e. 6751.  Under $ORACLE_HOME/bin. If permission are not proper the use root user to set the permissions. Else it will raise error (ORA-12537, )
-rwsr-s--x 1 oracle oinstall 106M Aug 25 2012 oracle

4)  Your database instance is registered with Local Listener? 

5)  Check if scan config are proper and showing the details similar to :
[oracle@mynode] srvctl config scan_listener
SCAN Listener LISTENER_SCAN1 exists. Port: TCP:1521
SCAN Listener LISTENER_SCAN2 exists. Port: TCP:1521
SCAN Listener LISTENER_SCAN3 exists. Port: TCP:1521

[oracle@mynode] srvctl config scan
SCAN name: sales1-scan, Network: 1/133.22.67.0/255.255.255.0/
SCAN VIP name: scan1, IP: /sales1-scan.example.com/133.22.67.192
SCAN VIP name: scan2, IP: /sales1-scan.example.com/133.22.67.193
SCAN VIP name: scan3, IP: /sales1-scan.example.com/133.22.67.194

6)  Check if all above 3 SCAN IPs are responding the ping.

7)  Your Oracle Client must be 11g R2 and above release in order to use SCAN functionality. 

8)  Following is the Oracle Document ID 975457.1. Which helped me for ORA-12537. Last case was valid for me.
 
9)  If the following all settings are proper then try to enable SQL Net tracing on Client site and check if the packer size between client and server are compatible.
(This you can confirm by connecting the oracle with normal connection. i.e. try using SQL>conn user/password@server-ip/oracle_sid.  if this is not working then you need to check packet size on TCP else you have oracle client 11gR2 then you may need to troubleshoot it with https://support.oracle.com only J)

How to Troubleshoot Connectivity Issue with 11gR2 SCAN Name [ID 975457.1]
Applies to:
Oracle Server - Enterprise Edition - Version: 11.2.0.1 and later   [Release: 11.2 and later ]
Information in this document applies to any platform.
Goal
The goal of this document is to provide checklist when connection through 11gR2 Grid Infrastructure (CRS) SCAN name to database fails.
Solution
Troubleshooting Steps
When client program connects to RAC database through SCAN name, SCAN listener will accept t he request and redirect the connection to local listener. To identify connection issue, first try to connect to each local listener through node VIP, then try each SCAN listener through each SCAN VIP.

To test through node VIP:
sqlplus <username>/<password>@<nodename-vip.domain>:<local-listener-port>/<service-name>

Example:

sqlplus scott/tiger@racnode1-vip.us.eot.com:1521/testsvc

Repeat the same test for all local listener/node VIP in the cluster.

If GNS is used, node VIP name will be in the format of nodename-vip.gnssubdomain (example racnode1-vip.us.eot.com)

If connection through local listener fails, check whether service/instance is registered properly to that local listener with "lsnrctl service <local-listener-name>".

To test through SCAN VIP address:
sqlplus <username>/<password>@<scan-ipn>:<scan-listener-port>/<service-name>

Example:

sqlplus scott/tiger@120.0.0.205:1521/testsvc

Note it's IP address instead of SCAN name

Repeat the same command for all SCAN IP
If connection through SCAN listener fails, check whether service/instance is registered properly to that SCAN listener with "lsnrctl service <SCAN-listener-name>".

Other client tool (JDBC or such) can also be used to test connection though sqlplus is preferred for the purpose of testing.

Example Output
Configuration
Below is an example output from a 2-node cluster with the following configuration:
  SCAN name and VIP:
nslookup eotcs.us.oracle.com
..
Name:   eotcs.us.oracle.com
Address: 120.0.0.207
Name:   eotcs.us.oracle.com
Address: 120.0.0.205
Name:   eotcs.us.oracle.com
Address: 120.0.0.206
Ping doesn't have to go through if ICMP is disabled but should return correct IP for corresponding name. 
ping -c 1 eotcs.us.oracle.com
PING eotcs.us.oracle.com (120.0.0.207) 56(84) bytes of data.
64 bytes from 120.0.0.207: icmp_seq=1 ttl=64 time=3.37 ms
..

ping -c 1 eotcs.us.oracle.com
PING eotcs.us.oracle.com (120.0.0.206) 56(84) bytes of data.
64 bytes from 120.0.0.206: icmp_seq=1 ttl=64 time=1.85 ms
..

ping -c 1 eotcs.us.oracle.com
PING eotcs.us.oracle.com (120.0.0.205) 56(84) bytes of data.
64 bytes from 120.0.0.205: icmp_seq=1 ttl=64 time=2.45 ms
..

ping -c 1 eotcs
PING eotcs.us.oracle.com (120.0.0.207) 56(84) bytes of data.
64 bytes from eotcs.us.oracle.com (120.0.0.207): icmp_seq=1 ttl=64 time=3.18 ms


  Node Public Name/IP Address
Name:   eyrac1f.us.oracle.com Address: 120.0.0.111
Name:   eyrac2f.us.oracle.com Address: 120.0.0.112


ping -c 1 eyrac1f.us.oracle.com
PING eyrac1f.us.oracle.com (120.0.0.111) 56(84) bytes of data.
64 bytes from eyrac1f.us.oracle.com (120.0.0.111): icmp_seq=1 ttl=64 time=3.36 ms
..

ping -c 1 eyrac2f.us.oracle.com
PING eyrac2f.us.oracle.com (120.0.0.112) 56(84) bytes of data.
64 bytes from eyrac2f.us.oracle.com (120.0.0.112): icmp_seq=1 ttl=64 time=3.37 ms
..

  Nodes VIP Name/IP Address
Name:   eyrac1fv.us.oracle.com Address: 120.0.0.211
Name:   eyrac2fv.us.oracle.com Address: 120.0.0.212


nslookup eyrac1fv.us.oracle.com

..
Name:   eyrac1fv.us.oracle.com
Address: 120.0.0.211

nslookup eyrac2fv.us.oracle.com

..
Name:   eyrac2fv.us.oracle.com
Address: 120.0.0.212


ping -c 1 eyrac1fv.us.oracle.com
PING eyrac1fv.us.oracle.com (120.0.0.211) 56(84) bytes of data.
64 bytes from eyrac1fv.us.oracle.com (120.0.0.211): icmp_seq=1 ttl=64 time=4.04 ms
..

ping -c 1 eyrac2fv.us.oracle.com
PING eyrac2fv.us.oracle.com (120.0.0.212) 56(84) bytes of data.
64 bytes from eyrac2fv.us.oracle.com (120.0.0.212): icmp_seq=1 ttl=64 time=1.98 ms
..
  Database Name: b2no
  Service Name: sno
  TNS Connection String
sno =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = eotcs.us.oracle.com)(PORT = 1521))
    (CONNECT_DATA =
      (SERVICE_NAME = sno)
    )
  )

1. Checklist on RAC Cluster Nodes
Prior to the following checking, please set environment variable GRID_HOME to home of 11.2 Grid Infrastructure installation, for example:
    GRID_HOME=/ogrid/gbase
    export GRID_HOME

Please note:

  * Oracle Network related files (sqlnet.ora, tnsnames.ora, listener.ora etc) are in $TNS_ADMIN or $ORACLE_HOME/network/admin if TNS_ADMIN is not set.
  * This note assumes SCAN VIP and node VIP are all up and running (can be verified through "srvctl status nodeapps" and "srvctl status scan" or "crsctl stat res"), troubleshooting of SCAN VIP or node VIP startup issue is out of scope of this note.


 A. SCAN Listener Resource Status

      A1. SCAN Configuration:
$GRID_HOME/bin/srvctl config scan

SCAN name: eotcs.us.oracle.com, Network: 1/120.0.0.0/255.255.255.0/eth3
SCAN VIP name: scan1, IP: /120.0.0.206/120.0.0.206
SCAN VIP name: scan2, IP: /120.0.0.207/120.0.0.207
SCAN VIP name: scan3, IP: /120.0.0.205/120.0.0.205

      A2. SCAN Listener Configuration:
$GRID_HOME/bin/srvctl config scan_listener
SCAN Listener LISTENER_SCAN1 exists. Port: TCP:1521
SCAN Listener LISTENER_SCAN2 exists. Port: TCP:1521
SCAN Listener LISTENER_SCAN3 exists. Port: TCP:1521

      A3. SCAN Listener Resource Status:
$GRID_HOME/bin/crsctl stat res -w "TYPE = ora.scan_listener.type"
NAME=ora.LISTENER_SCAN1.lsnr
TYPE=ora.scan_listener.type
TARGET=ONLINE
STATE=ONLINE on eyrac1f

NAME=ora.LISTENER_SCAN2.lsnr
TYPE=ora.scan_listener.type
TARGET=ONLINE
STATE=ONLINE on eyrac2f

NAME=ora.LISTENER_SCAN3.lsnr
TYPE=ora.scan_listener.type
TARGET=ONLINE
STATE=ONLINE on eyrac2f


    B. SCAN Listener Status and Service
Log on to corresponding RAC node to find out SCAN listener status and service once SCAN listener resource status is confirmed. All SCAN listener should have same service served. Please set ORACLE_HOME environment variable prior to run any lsnrctl command, for example:
ORACLE_HOME=$GRID_HOME
export ORACLE_HOME


      B1. SCAN Listener Status:
$GRID_HOME/bin/lsnrctl status LISTENER_SCAN2
..
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN2)))
STATUS of the LISTENER
------------------------
..
Listener Parameter File   /ogrid/gbase/network/admin/listener.ora
Listener Log File         /ogrid/gbase/log/diag/tnslsnr/eyrac2f/listener_scan2/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN2)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=120.0.0.207)(PORT=1521)))
..


      B2. SCAN Listener Service:
$GRID_HOME/bin/lsnrctl service LISTENER_SCAN2
..
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN2)))
Services Summary...
Service "b2no" has 2 instance(s).
  Instance "b2no1", status READY, has 1 handler(s) for this service...
    Handler(s):
      "DEDICATED" established:0 refused:0 state:ready
         REMOTE SERVER
         (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=eyrac1fv)(PORT=1521)))
  Instance "b2no2", status READY, has 1 handler(s) for this service...
    Handler(s):
      "DEDICATED" established:0 refused:0 state:ready
         REMOTE SERVER
         (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=eyrac2fv)(PORT=1521)))
Service "sno" has 2 instance(s).
  Instance "b2no1", status READY, has 1 handler(s) for this service...
    Handler(s):
      "DEDICATED" established:0 refused:0 state:ready
         REMOTE SERVER
         (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=eyrac1fv)(PORT=1521)))
  Instance "b2no2", status READY, has 1 handler(s) for this service...
    Handler(s):
      "DEDICATED" established:0 refused:0 state:ready
         REMOTE SERVER
         (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=eyrac2fv)(PORT=1521)))


    C. Node Listener Status and Service

      C1. Node Listener Status:
$GRID_HOME/bin/lsnrctl status LISTENER
..
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER)))
STATUS of the LISTENER
------------------------
..
Listener Parameter File   /ogrid/gbase/network/admin/listener.ora
Listener Log File         /home/oracle/app/oracle/diag/tnslsnr/eyrac2f/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=120.0.0.112)(PORT=1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=120.0.0.212)(PORT=1521)))
Services Summary...
Service "b2no" has 1 instance(s).
  Instance "b2no2", status READY, has 1 handler(s) for this service...
Service "sno" has 1 instance(s).
  Instance "b2no2", status READY, has 1 handler(s) for this service...


      C2. Node Listener Service:
$GRID_HOME/bin/lsnrctl service LISTENER
..
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER)))
Services Summary...
Service "b2no" has 1 instance(s).
  Instance "b2no2", status READY, has 1 handler(s) for this service...
    Handler(s):
      "DEDICATED" established:0 refused:0 state:ready
         LOCAL SERVER
Service "sno" has 1 instance(s).
  Instance "b2no2", status READY, has 1 handler(s) for this service...
    Handler(s):
      "DEDICATED" established:0 refused:0 state:ready
         LOCAL SERVER



    D. Database Service Status

      D1. Service Resource Configuration
$GRID_HOME/bin/srvctl config service -d b2no -s sno -a

Service name: sno
Service is enabled
Server pool: b2no_sno
Cardinality: 2
Disconnect: false
Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: false
Failover type: SELECT
Failover method: BASIC
TAF failover retries: 2
TAF failover delay: 20
Connection Load Balancing Goal: LONG
Runtime Load Balancing Goal: NONE
TAF policy specification: BASIC
Preferred instances: b2no1,b2no2
Available instances:


      D2. Service Resource Status:
$GRID_HOME/bin/srvctl status service -d b2no -s sno -v

Service sno is running on instance(s) b2no1,b2no2


    E. Instance Listener Parameter Setting:
      E1. remote_listener setting:
For 11gR2 database
SQL> show parameter remote_listener

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
remote_listener                      string      eotcs.us.oracle.com:1521

For pre-11gR2 database
SQL> show parameter remote_listener

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
remote_listener                      string      (ADDRESS_LIST =(ADDRESS = (PROTOCOL = TCP)(HOST = 120.0.0.206)(PORT = 1521))(ADDRESS = (PROTOCOL = TCP)(HOST = 120.0.0.207)(PORT = 1521))(ADDRESS = (PROTOCOL = TCP)(HOST = 120.0.0.205)(PORT = 1521)))

OR

remote_listener                      string      LISTENERS_SCAN

Note tnsnames.ora must have the following entry for LISTENERS_SCAN

LISTENERS_SCAN =
  (ADDRESS_LIST =
    (ADDRESS = (PROTOCOL = TCP)(HOST = 120.0.0.206)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = 120.0.0.207)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = 120.0.0.205)(PORT = 1521))
  )

If sqlnet.ora does not contain EZCONNECT in NAMES.DIRECTORY_PATH list, remote_listener should set to LISTENERS_SCAN as in above example.

      E2. local_listener setting:
        For Instance1:
SQL> show parameter local_listener

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
local_listener                       string      (DESCRIPTION=(ADDRESS_LIST=(AD
                                                 DRESS=(PROTOCOL=TCP)(HOST=eyra
                                                 c1fv)(PORT=1521))))

        For Instance2:
SQL> show parameter local_listener

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
local_listener                       string      (DESCRIPTION=(ADDRESS_LIST=(AD
                                                 DRESS=(PROTOCOL=TCP)(HOST=eyra
                                                 c2fv)(PORT=1521))))

2. Checklist on Client
A successful tnsping to TNS connection string for SCAN doesn't necessarily mean the connection will be successful, client should be able to resolve to SCAN name, node VIP name

For supported client version, refer to section "VERSION AND BACKWARD COMPATIBILITY" of the following:

http://www.oracle.com/technetwork/database/clustering/overview/scan-129069.pdf


    A. SCAN Name Resolution
nslookup and ping of SCAN name should return correct SCAN VIP(s), ORA-12545 could be reported if client can't resolve SCAN name properly


    B. Node VIP name:
By default, pfile/spfile parameter local_listener is set to short node VIP name instead of FQDN name, client need to be able to resolve to short VIP name as well as FQDN name; for example with following local_listener setting, client should be able to resolve short VIP name:
SQL> show parameter local_listener

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
local_listener                       string      (DESCRIPTION=(ADDRESS_LIST=(AD
                                                 DRESS=(PROTOCOL=TCP)(HOST=eyra
                                                 c1fv)(PORT=1521))))

ping -c 1 eyrac1fv
PING eyrac1fv.us.oracle.com (120.0.0.211) 56(84) bytes of data.
64 bytes from eyrac1fv.us.oracle.com (120.0.0.211): icmp_seq=1 ttl=64 time=4.04 ms
..

ping -c 1 eyrac2fv
PING eyrac2fv.us.oracle.com (120.0.0.212) 56(84) bytes of data.
64 bytes from eyrac2fv.us.oracle.com (120.0.0.212): icmp_seq=1 ttl=64 time=1.98 ms
..
If client can resolve FQDN node VIP name but not short node VIP name (client in different domain), ORA-12537 could be reported and pfile/spfile local_listener need to be adjusted with FQDN node VIP name:
SQL> alter system set local_listener='(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=eyrac1fv.us.oracle.com)(PORT=1521))))' sid='b2no1';

SQL> show parameter local_listener

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
local_listener                       string      (DESCRIPTION=(ADDRESS_LIST=(AD
                                                 DRESS=(PROTOCOL=TCP)(HOST=eyra
                                                 c1fv.us.oracle.com
)(PORT=1521))))

Once instance updated local_listener setting to listeners, SCAN listener should have similar output like following:
$GRID_HOME/bin/lsnrctl service LISTENER_SCAN2
..
Services Summary...
Service "b2no" has 2 instance(s).
  Instance "b2no1", status READY, has 1 handler(s) for this service...
    Handler(s):
      "DEDICATED" established:4 refused:0 state:ready
         REMOTE SERVER
         (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=eyrac1fv.us.oracle.com)(PORT=1521)))
  Instance "b2no2", status READY, has 1 handler(s) for this service...
    Handler(s):
      "DEDICATED" established:3 refused:0 state:ready
         REMOTE SERVER
         (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=eyrac2fv)(PORT=1521)))
Service "sno" has 2 instance(s).
  Instance "b2no1", status READY, has 1 handler(s) for this service...
    Handler(s):
      "DEDICATED" established:4 refused:0 state:ready
         REMOTE SERVER
         (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=eyrac1fv.us.oracle.com)(PORT=1521)))
  Instance "b2no2", status READY, has 1 handler(s) for this service...
    Handler(s):
      "DEDICATED" established:3 refused:0 state:ready
         REMOTE SERVER
         (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=eyrac2fv)(PORT=1521)))

Note for node1 it's FQDN name but for node2 it's still short name as node2 is not updated yet

If for some reason, client can't resolve FQDN node VIP name nor short node VIP name, pfile/spfile local_listener need to be adjusted with IP of VIP name:
SQL> alter system set local_listener='(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=120.0.0.211)(PORT=1521))))' sid='b2no1';

SQL> show parameter local_listener

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
local_listener                       string      (DESCRIPTION=(ADDRESS_LIST=(AD
                                                 DRESS=(PROTOCOL=TCP)(HOST=120.
                                                 0.0.211
)(PORT=1521))))

How to add disk to ASM

How to add disk to ASM (DATABASE) runing in production server
We have database running on ASM, after two years we faced the problem of space deficiency.
Now we planed to add disk to ASM diskgroup DATAGROUP.
SQL> @asm
NAME                 TOTAL_GB                FREE_GB
------------------------------ ---------- ----------
DATAGROUP            249.995117             15.2236328
IDXGROUP             149.99707              10.4892578

Steps are below
1) Create partition of disk /dev/sdm which we got new LUN from Storage
[root@rac-node1 ~]# fdisk -l /dev/sdm
Disk /dev/sdm: 85.8 GB, 85899345920 bytes
255 heads, 63 sectors/track, 10443 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/sdm doesn't contain a valid partition table
[root@rac-node1 ~]# fdisk /dev/sdm
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel. Changes will remain in memory only,
until you decide to write them. After that, of course, the previous
content won't be recoverable.

The number of cylinders for this disk is set to 10443.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
 (e.g., DOS FDISK, OS/2 FDISK)
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)
Command (m for help): n
Command action
 e extended
 p primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-10443, default 1):
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-10443, default 10443):
Using default value 10443
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.
[root@rac-node1 ~]# fdisk -l /dev/sdm
Disk /dev/sdm: 85.8 GB, 85899345920 bytes
255 heads, 63 sectors/track, 10443 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sdm1 1 10443 83883366 83 Linux
[root@rac-node1 ~]#


2) Configure the disk /dev/sdm1 to ASM and giving LABEL DATA5
[root@rac-node1 ~]# /etc/init.d/oracleasm createdisk DATA5 /dev/sdm1
Marking disk "DATA5" as an ASM disk: [ OK ]
[root@rac-node1 ~]# /etc/init.d/oracleasm scandisks
Scanning the system for Oracle ASMLib disks: [ OK ]
[root@rac-node1 ~]# /etc/init.d/oracleasm listdisks
DATA3
DATA4
DATA5
DISK1
DISK2
INDEX2
INDEX5
[root@rac-node1 ~]#
Scandisks on RAC -node2
[root@rac-node2 ~]# /etc/init.d/oracleasm scandisks
Scanning the system for Oracle ASMLib disks: [ OK ]
[root@rac-node2 ~]# /etc/init.d/oracleasm listdisks
DATA3
DATA4
DATA5
DISK1
DISK2
INDEX2
INDEX5
Add the disk to /etc/rawdevices
[root@rac-node2 bin]vi /etc/sysconfig/rawdevices
/dev/raw/raw6 /dev/sdm1    ==> add this to rawdevices file
And added to /etc/rc.local for permission on reboot
[root@rac-node2 bin]#vi /etc/rc.local

chmod 660 /dev/raw/raw6

Check the disk status 
SQL> set linesize 9999
SQL> ;
 SELECT
 NVL(a.name, '[CANDIDATE]') disk_group_name
 , b.path disk_file_path
 , b.name disk_file_name
 , b.failgroup disk_file_fail_group
 FROM
 v$asm_diskgroup a RIGHT OUTER JOIN v$asm_disk b USING (group_number)
 ORDER BY
* a.name
SQL> /
DISK_GROUP_NAME DISK_FILE_PATH DISK_FILE_NAME DISK_FILE_FAIL_GROUP
------------------------------ --------------
DATAGROUP       ORCL:DISK1 DISK1 DISK1
DATAGROUP       ORCL:INDEX5 INDEX5 INDEX5
DATAGROUP       ORCL:DATA4 DATA4 DATA4
DATAGROUP       ORCL:DATA3 DATA3 DATA3
IDXGROUP        ORCL:DISK2 DISK2 DISK2
IDXGROUP        ORCL:INDEX2 INDEX2 INDEX2
[CANDIDATE]    ORCL:DATA5 ==> this is the new disk 
7 rows selected.


3) Add disk DATA5 to diskgroup DATAGROUP
SQL> alter diskgroup DATAGROUP ADD DISK 'ORCL:DATA5' ;
Diskgroup altered.
Check disk status again
 
 
SQL> SELECT
 2 NVL(a.name, '[CANDIDATE]') disk_group_name
 , b.path disk_file_path
 , b.name disk_file_name
 3 4 5 , b.failgroup disk_file_fail_group
 6 FROM
 7 v$asm_diskgroup a RIGHT OUTER JOIN v$asm_disk b USING (group_number)
 8 ORDER BY
 9 a.name;
DISK_GROUP_NAME DISK_FILE_PATH ISK_FILE_NAME DISK_FILE_FAIL_GROUP
----------------------------------------------------------
DATAGROUP        ORCL:INDEX5 NDEX5 INDEX5
DATAGROUP        ORCL:DATA4 ATA4 DATA4
DATAGROUP        ORCL:DISK1 ISK1 DISK1
DATAGROUP        ORCL:DATA5 ATA5 DATA5
DATAGROUP        ORCL:DATA3 ATA3 DATA3
IDXGROUP         ORCL:INDEX2 NDEX2 INDEX2
IDXGROUP         ORCL:DISK2 ISK2 DISK2
7 rows selected. There is no candidates any more for DATA5

SQL> host cat script/asm.sql
select name,TOTAL_MB/1024 total_gb,free_mb/1024 FREE_GB from v$asm_diskgroup;
NAME             TOTAL_GB        FREE_GB
------------------------------ ---------- ----------
DATAGROUP       329.992188      95.21875
IDXGROUP        149.99707       10.4892578
SQL>
Completed.

RAC STARTUP SEQUENCE 11GR2

  

My pictorial version
Oracle High Availability Service (OHAS) is the first process to start of the GI stack on a node via init(1).
In a cluster, OHASD runs as “root”, while on Oracle Restart Environments, it runs as “oracle”.
Administrators can issue cluster wide commands using OHASD.
OHASD will start even if GI is explicitly disabled.
In 11gR2 and later, there are 2 new types of agent processes: Oracle Agent and Oracle Root Agent.
There are 2 sets of Oracle Agents and Oracle Root Agents, one for HAS stack and the other for CRS stack.
The Oracle Agent and Oracle Root Agent belonging to HAS stack are started byohasd daemon and Oracle Agent is owned by GI software owner which is responsible to start resources that do not require “root” privilege.
The Oracle Agent and Oracle Root Agent belonging to CRS stack are started bycrsd daemon.
OHAS starts 2 “Oracle Root Agents” –
First is spawned by OHAS. It initializes resources that need elevated OS privileges.
CSSD and CRSD daemons are mainly created.
– CRSD starts the next root agent and it requires elevated privileges which are mainly network related resources.
OHAS also starts 2 “Oracle Agents”.
– One starts resources for accessing OCR and Voting disk files. It is created by the OHAS daemon.
– Another is created by CRS daemon (CRSD). It starts all resources that do not require root access.
It runs with GI owner’s privileges. Its tasks was earlier performed by “racg” until 11.1 OPROCD process was replaced by “cssdagent” process from 11.2 GI. It was started with release 11.2.0.4 patchset for I/O fencing. In earlier releases, it was handled by “kernel hangcheck-timer module”.
EVMD publishes Oracle Clusterware events to all nodes of the cluster like, ability to start and stio nodes, instances and services.
Event Manager Logger (EVMLOGGER) daemon will be started by evmd and it subscribes to a list of events read from a configuration file, and it runs user-defined actions when those events occur. This daemon is intended for backward compatibility.
ONS,eONS is a publish-and-subscribe service meant for communicating FAN events to interested clients in the environment.
CTSS is Cluster Time Synchronization Service. It is an alternative to NTP server. It runs as an OBSERVER when NTP is available, or, as ACTIVE if NTP is not available, in which case the first node to start in the cluster becomes the master clock reference and all nodes joining later, will become slaves.
CSSD has 3 separate processes:
1. CSS Daemon (ocssd),
2. CSS Agent (cssdagent), &
3. CSS Monitor (cssdmonitor).
CSSDAGENT is responsible for spawning CSSD and is created by OHASD.
CSSDMONITOR monitors CSSD and overall node health is also spawned by OHAS.
ASM disk headers have metadata from 11.2 to allow CSSD start with the voting disks in ASM. “kfed” can be used to read teh header of an ASM disk.
“kfdhdb.vfstart” and “kfdhdb.vfend” fields tell CSS where to find the voting disks. This does not require the ASM to be up.Once the voting disks are identified, CSS can access and join the cluster.
Important daemons:
S.No Component
11.1 Clusterware
11.2 Grid Infrastructure
Linux Process Comment Linux Process Comment
1 CRS crsd.bin Runs as root crsd.bin Runs as root
2 CSS init.cssd, ocssd and ocssd.bin Except for ocssd.bin, rest 2 run as root ocssd.bin, cssdmonitor and cssdagent
3 EVM evmd, evmd.bin and evmlogger Evmd runs as root evmd.bin and evmlogger.bin
4 ONS ons


5 ONS/eONS

ons/eons ONS is Oracle Notification Service. eONS is a Java Process.
6 OPROCD oprocd Runs as root and provides node fencing instead of hangcheck timer kernel module

7 RACG racgmain and racgimon


8 CTSS

octssd.bin Runs as root
9 Oracle Agent

oraagent.bin
10 Oracle Root Agent

orarootagent Runs as root
11 Oracle High Availability Service

ohasd.bin Runs as root through init.
Extra Notes:
CRSD is run as “root” and restarts automatically, if it fails.
OCR resource configuration includes definitions of dependencies on other cluster resources, timeouts, retries, assignment and failover policies.
IN RAC, CRS is responsible for monitoring DB instances, listeners and services, and restarting them in case of a failure.
In a single instance Oracle Restart environment, application resources are managed by “ohasd” and not by “crsd”.
If “cssdagent” discovers “ocssd” has stopped, then it shuts down the node to guarantee data integrity.
“diskmon” daemon provides I/O fencing for Exadata storage.
=========================================
“Oracle Agent” from the HA stack starts the below:
1. EVMD and EVMLOGGER
2. “gipcd”
3. “gpnpd”
4. “mDNSd”
“Oracle Root Agent” spawned by HA stack, starts all daemons that require “root” privilege. Below are started.
1. crsd
2. ctssd
3. diskmon daemon.
4. ACFS daemon.
Once CRS is started, it will create another Oracle Agent and Oracle Root Agent.
If GI is owned by “grid” account, a second Oracle Agent is created and will be responsible for
– Starting and monitoring local ASM instance.
– ONS and eONS daemons.
– SCAN listener.
– Node Listener.
If GI is owned by the same owner as the RDBMS binaries, then the “oracle” Oracle agent will perform the tasks listed previously by the “grid” Oracle Agent.
“oracle” root agent will create the following background processes:
1. GNS, if available.
2. GNS VIP, if enabled.
3. ACFS registry.
4. Network.
5. SCAN VIP
6. Node VIP.
The functionality provided by the Oracle Agent Process in 11gR2 was provided by racgmain and racgimon background processes in earlier releases.
as provided by racgmain and racgimon background processes in earlier releases.


Where is the Voting disk located?
With Oracle 11gR2 the voting disk can be located in Automatic Storage Management devices.
To display the devices which are used for the voting disk execute the following command:
$ crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
—  —–    —————–                ——— ———
1. ONLINE   0aa25b63b2434f80bf5cf601b0691ef7 (/dev/sdh) [CLUSTERDATA]
2. ONLINE   b0f893e773f04f3fbfb89f66a9c86f67 (/dev/sdf) [CLUSTERDATA]
3. ONLINE   961b89bdd7024fc7bf82c5e5902a91bb (/dev/sdk) [CLUSTERDATA]
Located 3 voting disk(s).
It is possible to do the same, using an asm query, as you can see below the votingdisk is located in the clusterdata diskgroup.
SQL> select group_number,name,voting_files from v$asm_diskgroup;
GROUP_NUMBER NAME                           V
———— —————————— –
1 CLUSTERDATA                    Y
2 DATA                           N
3 SOFTWARE                       N
When we use asmcmd to look for the voting files, we don’t see them in the clusterdata diskgroup. So where is it located?
Using kfed and the filename displayed from the output of crctl query css votedisk, we can see the starting and end point of the voting disk.
SQL> !kfed read /dev/sdh | more

kfdhdb.grptyp:                        2 ; 0x026: KFDGTP_NORMAL
kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname:        CLUSTERDATA_0000 ; 0x028: length=16
kfdhdb.grpname:             CLUSTERDATA ; 0x048: length=11
kfdhdb.fgname:         CLUSTERDATA_0000 ; 0x068: length=16

kfdhdb.grpstmp.lo:            815948800 ; 0x0e8: USEC=0x0 MSEC=0x99 SECS=0xa MINS=0xc
kfdhdb.vfstart:                      96 ; 0x0ec: 0x00000060  <=== starting point
kfdhdb.vfend:                       128 ; 0x0f0: 0x00000080  <=== end point
kfdhdb.spfile:                        0 ; 0x0f4: 0x00000000
So on the device starting at block 96 until 127 we can find one of the voting disks.
using DD command we can dump the blocks, but as we don’t have dd default on windows we can also query one of the internal X$ tables.
example to use dd: dd if=/dev/sdh bs=1M skip=95 count=32 | od -c
We use the previous displayed diskgroup number and als the result of the kfed vfstart and vfend.
select GROUP_KFDAT,NUMBER_KFDAT,AUNUM_KFDAT,FNUM_KFDAT from x$kfdat where group_kfdat=1 and aunum_kfdat between 96 and 127
This will display allocation unit 96 untill 127, where the group_number is 1, beside the allocation unit it will display the disk and internal file number.
SQL> select GROUP_KFDAT,NUMBER_KFDAT,AUNUM_KFDAT,FNUM_KFDAT from  x$kfdat where group_kfdat=1 and aunum_kfdat between 96 and 127;
GROUP_KFDAT NUMBER_KFDAT AUNUM_KFDAT FNUM_KFDAT
———– ———— ———– ———-
1            0          96    1048572
1            0          97    1048572
1            0          98    1048572
1            0          99    1048572
1            0         100    1048572
1            0         101    1048572
……
1            0         116    1048572
1            0         117    1048572
GROUP_KFDAT NUMBER_KFDAT AUNUM_KFDAT FNUM_KFDAT
———– ———— ———– ———-
1            0         118    1048572
1            0         119    1048572
1            0         120    1048572
1            0         121    1048572
1            0         122    1048572
1            0         123    1048572
1            0         124    1048572
1            0         125    1048572
1            0         126    1048572
1            0         127    104857