Search

Friday, August 12, 2016

CRSD.BIN does not start

CRSD.BIN does not start

  • Note: in 11.2 ASM starts before crsd.bin, and brings up the diskgroup automatically if it contains the OCR.
  • Typical Problems   :
    • OCR not acessible
    • Networking problems ( includes firewall, Nameserver problems and Network related errors )     ==> For futher details see GENERIC Networking troubleshooting chapter
    • Common File Protection problems ( Oracle executable, Log Files, IPC sockets )     ==> For futher details see GENERIC File Protection troubleshooting chapter
  
Case #1: OCR not accessbible - CRS-5019 error

Reported error in ocssd.log               : [check] DgpAgent::queryDgStatus no data found in v$asm_diskgroup_stat
Reported Clusterware Error in CW alert.log: CRS-5019:All OCR locations are on ASM disk groups [OCR3], 
                                            and none of these disk groups are mounted. 
                                            Details are at "(:CLSN00100:)" in ".../ohasd/oraagent_grid/oraagent_grid.log".
Testing scenario :
In file  /etc/oracle/ocr.loc
Change entry  ocrconfig_loc=+OCR to ocrconfig_loc=+OCR3 

Clusterware status :
# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4534: Cannot communicate with Event Manager

# crsi
NAME                           TARGET     STATE           SERVER       STATE_DETAILS   
-------------------------      ---------- ----------      ------------ ------------------
ora.asm                        ONLINE     INTERMEDIATE    grac41       OCR not started
ora.cluster_interconnect.haip  ONLINE     ONLINE          grac41         
ora.crf                        ONLINE     ONLINE          grac41         
ora.crsd                       ONLINE     OFFLINE                        
ora.cssd                       ONLINE     ONLINE          grac41         
ora.cssdmonitor                ONLINE     ONLINE          grac41         
ora.ctssd                      ONLINE     ONLINE          grac41       OBSERVER  
ora.diskmon                    OFFLINE    OFFLINE                        
ora.drivers.acfs               ONLINE     ONLINE          grac41         
ora.evmd                       ONLINE     INTERMEDIATE    grac41         
ora.gipcd                      ONLINE     ONLINE          grac41         
ora.gpnpd                      ONLINE     ONLINE          grac41         
ora.mdnsd                      ONLINE     ONLINE          grac41         
--> Both resoruces ora.evmd and ora.asm reports their state in INTERMEDIATE - CRSD/EVMD doesn't come up

# ps -elf | egrep "PID|d.bin|ohas|oraagent|orarootagent|cssdagent|cssdmonitor" | grep -v grep
F S UID        PID  PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY          TIME CMD
4 S root      6098     1  0  80   0 -  2846 pipe_w 04:44 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run
4 S root     10132     1  1  80   0 - 179077 futex_ 10:09 ?       00:00:07 /u01/app/11204/grid/bin/ohasd.bin reboot
4 S grid     10295     1  0  80   0 - 175970 futex_ 10:09 ?       00:00:03 /u01/app/11204/grid/bin/oraagent.bin
0 S grid     10306     1  0  80   0 - 74289 poll_s 10:09 ?        00:00:00 /u01/app/11204/grid/bin/mdnsd.bin
0 S grid     10317     1  0  80   0 - 127382 hrtime 10:09 ?       00:00:01 /u01/app/11204/grid/bin/gpnpd.bin
0 S grid     10328     1  1  80   0 - 159711 hrtime 10:09 ?       00:00:04 /u01/app/11204/grid/bin/gipcd.bin
4 S root     10330     1  1  80   0 - 168735 futex_ 10:09 ?       00:00:04 /u01/app/11204/grid/bin/orarootagent.bin
4 S root     10344     1  6 -40   - - 160907 hrtime 10:09 ?       00:00:25 /u01/app/11204/grid/bin/osysmond.bin
4 S root     10355     1  0 -40   - - 162793 futex_ 10:09 ?       00:00:01 /u01/app/11204/grid/bin/cssdmonitor
4 S root     10374     1  0 -40   - - 162921 futex_ 10:09 ?       00:00:01 /u01/app/11204/grid/bin/cssdagent
4 S grid     10385     1  4 -40   - - 166593 futex_ 10:09 ?       00:00:17 /u01/app/11204/grid/bin/ocssd.bin 
4 S root     10733     1  0 -40   - - 127270 poll_s 10:09 ?       00:00:03 /u01/app/11204/grid/bin/ologgerd 
4 S root     11286     1  0  80   0 - 159388 futex_ 10:09 ?       00:00:03 /u01/app/11204/grid/bin/octssd.bin reboot
0 S grid     11307     1  0  80   0 - 75351 hrtime 10:09 ?        00:00:00 /u01/app/11204/grid/bin/evmd.bin

Tracefile Details :
alertgrac41.log:
2014-05-24 10:13:48.085: 
[/u01/app/11204/grid/bin/oraagent.bin(10295)]CRS-5019:All OCR locations are on ASM disk groups [OCR3], 
                                                      and none of these disk groups are mounted. 
              Details are at "(:CLSN00100:)"   in ".../agent/ohasd/oraagent_grid/oraagent_grid.log".
2014-05-24 10:14:18.144: 
[/u01/app/11204/grid/bin/oraagent.bin(10295)]CRS-5019:All OCR locations are on ASM disk groups [OCR3], 
                                                      and none of these disk groups are mounted. 
           Details are at "(:CLSN00100:)" in ".../agent/ohasd/oraagent_grid/oraagent_grid.log".

ohasd/oraagent_grid/oraagent_grid.log
2014-05-24 10:10:47.848: [ora.asm][4179105536]{0:0:2} [check] AsmAgent::check ocrCheck 1 m_OcrOnline 0 m_OcrTimer 30
2014-05-24 10:10:47.848: [ora.asm][4179105536]{0:0:2} [check] DgpAgent::initOcrDgpSet { entry
2014-05-24 10:10:47.848: [ora.asm][4179105536]{0:0:2} [check] DgpAgent::initOcrDgpSet procr_get_conf: retval [0] 
                                                                                      configured [1] local only [0] error buffer []
2014-05-24 10:10:47.848: [ora.asm][4179105536]{0:0:2} [check] DgpAgent::initOcrDgpSet procr_get_conf: OCR loc [0], Disk Group : [+OCR3]
2014-05-24 10:10:47.848: [ora.asm][4179105536]{0:0:2} [check] DgpAgent::initOcrDgpSet m_ocrDgpSet 02d965f8 dgName OCR3
2014-05-24 10:10:47.848: [ora.asm][4179105536]{0:0:2} [check] DgpAgent::initOcrDgpSet ocrret 0 found 1
2014-05-24 10:10:47.848: [ora.asm][4179105536]{0:0:2} [check] DgpAgent::initOcrDgpSet ocrDgpSet OCR3 
2014-05-24 10:10:47.848: [ora.asm][4179105536]{0:0:2} [check] DgpAgent::initOcrDgpSet exit }
2014-05-24 10:10:47.848: [ora.asm][4179105536]{0:0:2} [check] DgpAgent::ocrDgCheck Entry {
2014-05-24 10:10:47.848: [ora.asm][4179105536]{0:0:2} [check] DgpAgent::getConnxn connected
2014-05-24 10:10:47.850: [ora.asm][4179105536]{0:0:2} [check] DgpAgent::queryDgStatus excp no data found
2014-05-24 10:10:47.850: [ora.asm][4179105536]{0:0:2} [check] DgpAgent::queryDgStatus no data found in v$asm_diskgroup_stat
2014-05-24 10:10:47.851: [ora.asm][4179105536]{0:0:2} [check] DgpAgent::queryDgStatus dgName OCR3 ret 1
2014-05-24 10:10:47.851: [ora.asm][4179105536]{0:0:2} [check] (:CLSN00100:)DgpAgent::ocrDgCheck OCR dgName OCR3 state 1
2014-05-24 10:10:47.851: [ora.asm][4179105536]{0:0:2} [check] AsmAgent::check ocrCheck 2 m_OcrOnline 0 m_OcrTimer 31
2014-05-24 10:10:47.851: [ora.asm][4179105536]{0:0:2} [check] CrsCmd::ClscrsCmdData::stat entity 5 statflag 32 useFilter 1
2014-05-24 10:10:48.053: [ COMMCRS][3665618688]clsc_connect: (0x7f77d4101f30) no listener at (ADDRESS=(PROTOCOL=IPC)(KEY=CRSD_UI_SOCKET))

Debugging steps
Identify OCR  ASM DG on a running instance with asmcmd
[grid@grac42 ~]$  asmcmd lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB    Offline_disks  Voting_files  Name
MOUNTED  NORMAL  N         512   4096  1048576      3057                0             N  ASMLIB_DG/
MOUNTED  NORMAL  N         512   4096  1048576     30708                0             N  DATA/
MOUNTED  EXTERN  N         512   4096  1048576     40946                0             N  FRA/
MOUNTED  NORMAL  N         512   4096  1048576      6141                0             OCR/
MOUNTED  NORMAL  N         512   4096  1048576      3057                0             N  SSD/
--> diskgroup OCR is serving our voting disks ( and not DG +OCR3 )

Check availale disks:
[grid@grac42 ~]$ asmcmd lsdsk -k
Total_MB  Free_MB  OS_MB  Name            Failgroup       Failgroup_Type  Library   Path
   10236     2809  10236  DATA_0001       DATA_0001       REGULAR         System    /dev/asmdisk1_udev_sdc1
   10236     2804  10236  DATA_0002       DATA_0002       REGULAR         System    /dev/asmdisk1_udev_sdd1
   10236     2804  10236  DATA_0003       DATA_0003       REGULAR         System    /dev/asmdisk1_udev_sde1
    2047     1696   2047  OCR_0000        OCR_0000        REGULAR         System    /dev/asmdisk1_udev_sdf1
    2047     1696   2047  OCR_0001        OCR_0001        REGULAR         System    /dev/asmdisk1_udev_sdg1
    2047     1697   2047  OCR_0002        OCR_0002        REGULAR         System    /dev/asmdisk1_udev_sdh1
    1019      875   1019  SSD_0000        SSD_0000        REGULAR         System    /dev/asmdisk8_ssd1
    1019      875   1019  SSD_0001        SSD_0001        REGULAR         System    /dev/asmdisk8_ssd2
    1019      875   1019  SSD_0002        SSD_0002        REGULAR         System    /dev/asmdisk8_ssd3
   20473    10467  20473  FRA_0001        FRA_0001        REGULAR         System    /dev/asmdisk_fra1
   20473    10461  20473  FRA_0002        FRA_0002        REGULAR         System    /dev/asmdisk_fra2
    1019      882   1019  ASMLIB_DG_0000  ASMLIB_DG_0000  REGULAR         System    /dev/oracleasm/disks/ASMLIB_DISK1
    1019      882   1019  ASMLIB_DG_0001  ASMLIB_DG_0001  REGULAR         System    /dev/oracleasm/disks/ASMLIB_DISK2
    1019      882   1019  ASMLIB_DG_0002  ASMLIB_DG_0002  REGULAR         System    /dev/oracleasm/disks/ASMLIB_DISK3
-->  /dev/asmdisk1_udev_sdf1,  /dev/asmdisk1_udev_sdg1 , /dev/asmdisk1_udev_sdh1 are 
     our disk serving the voting files

Verify on a working system the voting disk 
[grid@grac42 ~]$  kfed read  /dev/asmdisk1_udev_sdf1  | egrep 'vf|name'
kfdhdb.dskname:                OCR_0000 ; 0x028: length=8
kfdhdb.grpname:                     OCR ; 0x048: length=3
kfdhdb.fgname:                 OCR_0000 ; 0x068: length=8
kfdhdb.capname:                         ; 0x088: length=0
kfdhdb.vfstart:                     320 ; 0x0ec: 0x00000140
kfdhdb.vfend:                       352 ; 0x0f0: 0x00000160
Note: If the markers between vfstart & vfend are not 0 then disk does contain voting disk !

Check OCR with ocrcheck
[grid@grac41 ~]$ ocrcheck
PROT-602: Failed to retrieve data from the cluster registry
PROC-26: Error while accessing the physical storage

Tracing ocrcheck              
[grid@grac41 ~]$ strace -f -o ocrcheck.trc ocrcheck
[grid@grac41 ~]$ grep open ocrcheck.trc  | grep ocr.loc
17530 open("/etc/oracle/ocr.loc", O_RDONLY) = 6
--> ocrcheck reads /etc/oracle/ocr.loc
[grid@grac41 ~]$ cat /etc/oracle/ocr.loc
ocrconfig_loc=+OCR3
local_only=false
==> +OCR3 is a wrong entry 

Verify CW state with cluvfy
[grid@grac42 ~]$ cluvfy comp ocr -n grac42,grac41
Verifying OCR integrity 
Checking OCR integrity...
Checking the absence of a non-clustered configuration...
All nodes free of non-clustered, local-only configurations
ERROR: 
PRVF-4193 : Asm is not running on the following nodes. 
            Proceeding with the remaining nodes.
    grac41
Checking OCR config file "/etc/oracle/ocr.loc"...
OCR config file "/etc/oracle/ocr.loc" check successful
ERROR: 
PRVF-4195 : Disk group for ocr location "+OCR" not available on the following nodes:
    grac41
NOTE: 
This check does not verify the integrity of the OCR contents. 
Execute 'ocrcheck' as a privileged user to verify the contents of OCR.
OCR integrity check failed
Verification of OCR integrity was unsuccessful. 
Checks did not pass for the following node(s):
    grac41