Search

Friday, August 12, 2016

ologerrd daemon not started : Error CRS-9011 running oclumon



Error CRS-9011 running oclumon – ologerrd daemon not started

Cluster Logger Service (Ologgerd) – 
This is processes is one of the components for the Cluster Health Monitor.
There is only one master ologgerd and one replica ologgerd per cluster.
there is a master ologgerd that receives the data from other nodes and saves them in the repository.  It compresses the data before persisting to save the disk space.
In an environment with multiple nodes, a replica ologgerd is also started on a node where the master ologgerd is not running.
The master ologgerd will sync the data with replica ologgerd by sending the data to the replica ologgerd. 
The replica ologgerd takes over if the master ologgerd dies.
A new replica ologgerd starts when the replica ologgerd dies. 


Using oclumon we are going to fix the problem make sure oclumon up and running

Locate CHM log directory
Check CHM resource status and locate Master Node
[grid@grac41 ~] $ $GRID_HOME/bin/crsctl status res ora.crf -init
NAME=ora.crf
TYPE=ora.crf.type
TARGET=ONLINE
STATE=ONLINE on grac41
[grid@grac41 ~]$ oclumon manage -get MASTER
Master = grac43

Login into grac43 and located CHM log directory  ( ologgerd process )  
[root@grac43 ~]#  ps -elf |grep ologgerd | grep -v grep
 .... /u01/app/11204/grid/bin/ologgerd -M -d /u01/app/11204/grid/crf/db/grac43

Comparison of  OSWatcher and CHM
  • CHM CPU overhead  for a single run is lower than OSWatcher as CHM don’t uses  iostat,vmstat to collect data
  • OSWatcher runs with user priorty compared to RT priority of CHM ( CHM should be able to collect data even under CPU starvation )
  • OSWatcher does a better job tracing network related stats like top, traceroute, and netstat
  • TFA can reduce the number of uploaded files

How to start and stop CHM that is installed as a part of GI in 11.2 and higher
Starting and stopping ora.crf resource starts and stops CHM.
Check status:
$GRID_HOME/bin/crsctl status res ora.crf -init

To stop CHM (or ora.crf resource managed by ohasd)
$GRID_HOME/bin/crsctl stop res ora.crf -init

To start CHM (or ora.crf resource managed by ohasd)
$GRID_HOME/bin/crsctl start res ora.crf -init

Check status on a specific node;
$ ssh grac42 $GRID_HOME/bin/crsctl status res ora.crf -init | grep STATE


Error CRS-9011 running oclumon – ologerrd daemon not started
oclumon dumpnodeview -n grac41  -last "00:15:00"
CRS-9011-Error dumpnodeview: Failed to initialize connection to the Cluster Logger Service
$ ps -ef | egrep "sysmond|loggerd"
root      3820     1  2 Feb20 ?        00:26:15 /u01/app/11204/grid/bin/osysmond.bin
--> Ologgerd deamon is not running

Fix:

1. Stop ora.crf as root user on all nodes
# /u01/app/11204/grid/bin/crsctl stop res ora.crf -init
CRS-2673: Attempting to stop 'ora.crf' on 'grac41'
CRS-2677: Stop of 'ora.crf' on 'grac41' succeeded

2. Comment the "BDBSIZE" entry and save the changes. ( file $GRID_HOME/crf/admin/crfgrac41.ora )

3. Start the ora.crf resource on all nodes
# /u01/app/11204/grid/bin/crsctl  start res ora.crf -init
CRS-2672: Attempting to start 'ora.crf' on 'grac41'
CRS-2676: Start of 'ora.crf' on 'grac41' succeeded

4. Verify that ologgerd daemon is running
#  ps -ef | egrep "sysmond|loggerd"
root     27213     1  4 11:22 ?        00:00:00 /u01/app/11204/grid/bin/osysmond.bin
root     27227     1  4 11:22 ?        00:00:00 /u01/app/11204/grid/bin/ologgerd -M -d /u01/app/11204/grid/crf/db/grac41
root     27243 20061  0 11:22 pts/7    00:00:00 egrep sysmond|loggerd
--> Ologgerd deamin is now running

5. Verify oclumon is working now
$ oclumon manage -get MASTER
Master = grac41
 Done