Search

Friday, August 12, 2016

OHASD Agents do not start - oraagent, orarootagent, cssdagent / cssdmonitor

OHASD Agents do not start

  • OHASD.BIN will spawn four agents/monitors to start resource:
  • oraagent: responsible for ora.asm, ora.evmd, ora.gipcd, ora.gpnpd, ora.mdnsd etc
  • orarootagent: responsible for ora.crsd, ora.ctssd, ora.diskmon, ora.drivers.acfs etc
  • cssdagent / cssdmonitor: responsible for ora.cssd(for ocssd.bin) and ora.cssdmonitor(for cssdmonitor itself)
If ohasd.bin can not start any of above agents properly, clusterware will not 
come to healthy state.
 
Potential Problems
1. Common causes of agent failure are that the log file or log directory for the agents don't have proper ownership or permission.
2. If agent binary (oraagent.bin or orarootagent.bin etc) is corrupted, agent will not start resulting in related resources not coming up:

Debugging CRS startup if  trace file location is not accessible  
Action - Change trace directory 
[grid@grac41 log]$ mv  $GRID_HOME/log/grac41 $GRID_HOME/log/grac41_nw
[grid@grac41 log]$ crsctl start crs
CRS-4124: Oracle High Availability Services startup failed.
CRS-4000: Command Start failed, or completed with errors.

Process Status and CRS status
[root@grac41 .oracle]# ps -elf | egrep "PID|d.bin|ohas|oraagent|orarootagent|cssdagent|cssdmonitor" | grep -v grep
F S UID        PID  PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY          TIME CMD
4 S root      5396     1  0  80   0 -  2847 pipe_w 10:52 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run
4 S root     26705 25370  1  80   0 - 47207 hrtime 14:05 pts/7    00:00:00 /u01/app/11204/grid/bin/crsctl.bin start crs
[root@grac41 .oracle]# crsctl check crs
CRS-4639: Could not contact Oracle High Availability Services

OS Tracefile: /var/log/messages 
May 13 13:48:27 grac41 root: exec /u01/app/11204/grid/perl/bin/perl -I/u01/app/11204/grid/perl/lib 
                 /u01/app/11204/grid/bin/crswrapexece.pl 
  /u01/app/11204/grid/crs/install/s_crsconfig_grac41_env.txt /u01/app/11204/grid/bin/ohasd.bin "reboot"
May 13 13:48:27 grac41 OHASD[22203]: OHASD exiting; Directory /u01/app/11204/grid/log/grac41/ohasd not found 
 
 
Debugging steps 
 
[root@grac41 gpnpd]# strace -f -o ohas.trc crsctl start crs
CRS-4124: Oracle High Availability Services startup failed.
CRS-4000: Command Start failed, or completed with errors.
[root@grac41 gpnpd]# grep ohasd ohas.trc
...
22203 execve("/u01/app/11204/grid/bin/ohasd.bin", ["/u01/app/11204/grid/bin/ohasd.bi"..., "reboot"], [/* 60 vars */]) = 0
22203 stat("/u01/app/11204/grid/log/grac41/ohasd", 0x7fff17d68f40) = -1 ENOENT (No such file or directory)
==> Directory /u01/app/11204/grid/log/grac41/ohasd was missing or has wrong protection

Using clufy comp  olr
[grid@grac41 ~]$ cluvfy comp  olr

Verifying OLR integrity 
Checking OLR integrity...
Checking OLR config file...

ERROR: 
2014-05-17 18:26:41.576:  CLSD: A file system error occurred while attempting to create default permissions for 
                          file "/u01/app/11204/grid/log/grac41/alertgrac41.log" during alert open processing for 
                          process "client". Additional diagnostics: 
                          LFI-00133: Trying to create file /u01/app/11204/grid/log/grac41/alertgrac41.log 
                          that already exists. 
                          LFI-01517: open() failed(OSD return value = 13).  
2014-05-17 18:26:41.585:  CLSD: An error was encountered while attempting to 
                          open alert log "/u01/app/11204/grid/log/grac41/alertgrac41.log". 
                          Additional diagnostics: (:CLSD00155:) 2014-05-17 18:26:41.585:  

OLR config file check successful
Checking OLR file attributes...
ERROR: 
PRVF-4187 : OLR file check failed on the following nodes:
    grac41
    grac41:PRVF-4127 : Unable to obtain OLR location
/u01/app/11204/grid/bin/ocrcheck -config -local
<CV_CMD>/u01/app/11204/grid/bin/ocrcheck -config -local </CV_CMD><CV_VAL>2014-05-17 18:26:45.202: 
CLSD: A file system error occurred while attempting to create default permissions for file 
"/u01/app/11204/grid/log/grac41/alertgrac41.log" during alert open processing for process "client". 
Additional diagnostics: LFI-00133: Trying to create file /u01/app/11204/grid/log/grac41/alertgrac41.log 
that already exists.
LFI-01517: open() failed(OSD return value = 13).

2014-05-17 18:26:45.202: 
CLSD: An error was encountered while attempting to open alert log 
"/u01/app/11204/grid/log/grac41/alertgrac41.log". Additional diagnostics: (:CLSD00155:)
2014-05-17 18:26:45.202: 
CLSD: Alert logging terminated for process client. File name: "/u01/app/11204/grid/log/grac41/alertgrac41.log"
2014-05-17 18:26:45.202: 
CLSD: A file system error occurred while attempting to create default permissions for file 
"/u01/app/11204/grid/log/grac41/client/ocrcheck_7617.log" during log open processing for process "client". 
Additional diagnostics: LFI-00133: Trying to create file /u01/app/11204/grid/log/grac41/client/ocrcheck_7617.log 
that already exists.
LFI-01517: open() failed(OSD return value = 13).

2014-05-17 18:26:45.202: 
CLSD: An error was encounteredcluvfy comp gpnp  while attempting to open log file 
"/u01/app/11204/grid/log/grac41/client/ocrcheck_7617.log". 
Additional diagnostics: (:CLSD00153:)
2014-05-17 18:26:45.202: 
CLSD: Logging terminated for process client. File name: "/u01/app/11204/grid/log/grac41/client/ocrcheck_7617.log"
Oracle Local Registry configuration is :
     Device/File Name         : /u01/app/11204/grid/cdata/grac41.olr
</CV_VAL><CV_VRES>0</CV_VRES><CV_LOG>Exectask: runexe was successful</CV_LOG><CV_ERES>0</CV_ERES>
OLR integrity check failed
Verification of OLR integrity was unsuccessf