OHASD stack like gpnpd.bin or are not starting up (mdnsd.bin, gipcd.bin use same method)
--> GPnP deamon remains ins starting Mode
ora.gpnpd ONLINE OFFLINE STARTING
Case 1 : Wrong protection for executable $GRID_HOME/bin/gpnpd.bin
==> For add. details see GENERIC File Protection chapter
Reported error in oraagent_grid.log : [ clsdmc][1103787776]Fail to connect (ADDRESS=(PROTOCOL=ipc)
(KEY=grac41DBG_GPNPD)) with status 9
[ora.gpnpd][1103787776]{0:0:2} [start] Error = error 9 encountered
when connecting to GPNPD
Reported Clusterware Error in CW alert.log: [/u01/app/11204/grid/bin/oraagent.bin(20333)]
CRS-5818:Aborted command 'start' for resource 'ora.gpnpd'.
Details at (:CRSAGF00113:) {0:0:2} in ..... ohasd/oraagent_grid/oraagent_grid.log
Testing scenario :
[grid@grac41 ~]$ chmod 444 $GRID_HOME/bin/gpnpd.bin
[grid@grac41 ~]$ ls -l $GRID_HOME/bin/gpnpd.bin
-r--r--r--. 1 grid oinstall 368780 Mar 19 17:07 /u01/app/11204/grid/bin/gpnpd.bin
[root@grac41 gpnp]# crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
Clusterware status :
$ crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager
$ crsi
NAME TARGET STATE SERVER STATE_DETAILS
------------------------- ---------- ---------- ------------ ------------------
ora.asm ONLINE OFFLINE Instance Shutdown
ora.cluster_interconnect.haip ONLINE OFFLINE
ora.crf ONLINE OFFLINE
ora.crsd ONLINE OFFLINE
ora.cssd ONLINE OFFLINE
ora.cssdmonitor OFFLINE OFFLINE
ora.ctssd ONLINE OFFLINE
ora.diskmon OFFLINE OFFLINE
ora.drivers.acfs ONLINE OFFLINE
ora.evmd ONLINE OFFLINE
ora.gipcd ONLINE OFFLINE
ora.gpnpd ONLINE OFFLINE STARTING
ora.mdnsd ONLINE ONLINE grac41
--> GPnP deamon remains ins starting Mode
$ ps -elf | egrep "PID|d.bin|ohas" | grep -v grep
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
4 S root 6098 1 0 80 0 - 2846 pipe_w 04:44 ? 00:00:00 /bin/sh /etc/init.d/init.ohasd run
4 S root 20127 1 23 80 0 - 176890 futex_ 11:52 ? 00:00:51 /u01/app/11204/grid/bin/ohasd.bin reboot
4 S grid 20333 1 0 80 0 - 166464 futex_ 11:52 ? 00:00:02 /u01/app/11204/grid/bin/oraagent.bin
0 S grid 20344 1 0 80 0 - 74289 poll_s 11:52 ? 00:00:00 /u01/app/11204/grid/bin/mdnsd.bin
Review Tracefile :
alertgrac41.log
[/u01/app/11204/grid/bin/oraagent.bin(27632)]CRS-5818:Aborted command 'start' for resource 'ora.gpnpd'.
Details at (:CRSAGF00113:) {0:0:2} in ... agent/ohasd/oraagent_grid/oraagent_grid.log.
2014-05-12 10:27:51.747:
[ohasd(27477)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.gpnpd'.
Details at (:CRSPE00111:) {0:0:2} in /u01/app/11204/grid/log/grac41/ohasd/ohasd.log
oraagent_grid.log
[ clsdmc][1103787776]Fail to connect (ADDRESS=(PROTOCOL=ipc)(KEY=grac41DBG_GPNPD)) with status 9
2014-05-12 10:27:17.476: [ora.gpnpd][1103787776]{0:0:2} [start] Error = error 9 encountered when connecting to GPNPD
2014-05-12 10:27:18.477: [ora.gpnpd][1103787776]{0:0:2} [start] without returnbuf
2014-05-12 10:27:18.659: [ COMMCRS][1125422848]clsc_connect: (0x7f3b300d92d0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=grac41DBG_GPNPD))
ohasd.log
2014-05-12 10:27:51.745: [ AGFW][2693531392]{0:0:2} Received the reply to the message: RESOURCE_START[ora.gpnpd 1 1] ID 4098:362 from
the agent /u01/app/11204/grid/bin/oraagent_grid
2014-05-12 10:27:51.745: [ AGFW][2693531392]{0:0:2} Agfw Proxy Server sending the reply to PE for message:
RESOURCE_START[ora.gpnpd 1 1] ID 4098:361
2014-05-12 10:27:51.747: [ CRSPE][2212488960]{0:0:2} Received reply to action [Start] message ID: 361
2014-05-12 10:27:51.747: [ INIT][2212488960]{0:0:2} {0:0:2} Created alert : (:CRSPE00111:) : Start action timed out!
2014-05-12 10:27:51.747: [ CRSPE][2212488960]{0:0:2} Start action failed with error code: 3
2014-05-12 10:27:52.123: [ AGFW][2693531392]{0:0:2} Received the reply to the message: RESOURCE_START[ora.gpnpd 1 1] ID 4098:362 from the
agent /u01/app/11204/grid/bin/oraagent_grid
2014-05-12 10:27:52.123: [ AGFW][2693531392]{0:0:2} Agfw Proxy Server sending the last reply to PE for message:
RESOURCE_START[ora.gpnpd 1 1] ID 4098:361
2014-05-12 10:27:52.123: [ CRSPE][2212488960]{0:0:2} Received reply to action [Start] message ID: 361
2014-05-12 10:27:52.123: [ CRSPE][2212488960]{0:0:2} RI [ora.gpnpd 1 1] new internal state: [STABLE] old value: [STARTING]
2014-05-12 10:27:52.123: [ CRSPE][2212488960]{0:0:2} CRS-2674: Start of 'ora.gpnpd' on 'grac41' failed
--> Here we see that are failing to start GPnP resource
Debugging steps :
Is process gpnpd.bin running ?
[root@grac41 ~]# ps -elf | egrep "PID|gpnpd" | grep -v grep
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
--> Missing process gpnpd.bin
Restart CRS with strace support
# crsctl stop crs -f
# strace -t -f -o crs_startup.trc crsctl start crs
Check for EACESS erros and check return values of execve() and aceess() sytem calls :
[root@grac41 oracle]# grep EACCES crs_startup.trc
28301 12:19:46 access("/u01/app/11204/grid/bin/gpnpd.bin", X_OK) = -1 EACCES (Permission denied)
Review crs_startup.trc more in detail
27345 12:15:35 execve("/u01/app/11204/grid/bin/gpnpd.bin", ["/u01/app/11204/grid/bin/gpnpd.bi"...], [/* 73 vars */] <unfinished ...>
27238 12:15:35 <... lseek resumed> ) = 164864
27345 12:15:35 <... execve resumed> ) = -1 EACCES (Permission denied)
27345 12:15:35 access("/u01/app/11204/grid/bin/gpnpd.bin", X_OK <unfinished ...>
27238 12:15:35 <... read resumed> "\25\23\"\1\23\3\t\t\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 256) = 256
27345 12:15:35 <... access resumed> ) = -1 EACCES (Permission denied)
Verify problem with cluvfy
[grid@grac41 ~]$ cluvfy comp software -verbose
Verifying software
Check: Software
Component: crs
Node Name: grac41
..
Permissions of file "/u01/app/11204/grid/bin/gpnpd.bin" did not match the expected value.
[Expected = "0755" ; Found = "0444"]
/u01/app/11204/grid/bin/gpnptool.bin..."Permissions" did not match reference
..
WHAT IS GPNP PROFILE?
The GPnP profile is a small XML file
located in GRID_HOME/gpnp/<hostname>/profiles/peer under the name
profile.xml. It is used to establish the correct global personality of a
node. Each node maintains a local copy of the GPnP Profile and is
maintanied by the GPnP Deamon (GPnPD) .
WHAT DOES GPNP PROFILE CONTAIN?
GPnP Profile is used to store
necessary information required for the startup of Oracle Clusterware
like SPFILE location,ASM DiskString etc.
It contains various attributes defining node personality.
- Cluster name
- Network classifications (Public/Private)
- Storage to be used for CSS
- Storage to be used for ASM : SPFILE location,ASM DiskString etc
- Digital signature information
: The profile is security sensitive. It might identify the storage to
be used as the root partition of a machine. Hence, it contains digital
signature information of the provisioning authority.
Here is the GPnP profile of my RAC setup.
gpnptool can be used for reading/editing the gpnp profile.
[root@host01 peer]# gpnptool get
Next CRSD needs to read OCR to startup
various resources on the node and hence update it as status of
resources changes. Since OCR is also on ASM, location of ASM SPfile
should be known.
The order of searching the ASM SPfile is
- - GPnP profile
- - ORACLE_HOME/dbs/spfile<sid.ora>
- - ORACLE_HOME/dbs/init<sid.ora>
In cluster environment, the location of SPfile for ASMread from GPnP profile.
[grid@host01 peer]$ gpnptool getpval -asm_spf
Warning: some command line parameters were defaulted. Resulting command line:
/u01/app/11.2.0/grid/bin/gpnptool.bin getpval -asm_spf -p=profile.xml -o-
+DATA/cluster01/asmparameterfile/registry.253.793721441
The oputput of the query shows that SPfile is on ASM in DATA diskgroup. To find out the
location of ASM disks, following query is issued :
[root@host01 peer]# gpnptool getpval -asm_dis
ASM-Profile id=”asm” DiscoveryString=””
The device headers of every device in
the disk string returned by the above query are scanned (if configured
by you at ASM initial setup time). Here Discovery String is blank is as
ASMDISKSTRINGS parameter has not been set. Hence, headers of all the ASM disks are scanned .
Here, I have shown the output of the query only on the disk which contains SPfile.(spfflg is not null)
[root@host01 ~]# kfed read /dev/sdb3 | grep -E ‘spf|ausize’
kfdhdb.ausize: 1048576 ; 0x0bc: 0x00100000
kfdhdb.spfile: 16 ; 0x0f4: 0x00000010
kfdhdb.spfflg: 1 ; 0x0f8: 0x00000001
In the output above, we see that
the device /dev/sdb3 contains a copy of the ASM spfile (spfflg=1).
The ASM spfile location starts at the disk offset of 16 (spfile=16)
Considering the allocation unit size (kfdhdb.ausize = 1M), let’s dump the ASM spfile from the device:
[root@host01 ~]# dd if=/dev/sdb3 of=spfileASM_Copy2.ora skip=16 bs=1M count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.170611 seconds, 6.1 MB/s
[root@host01 ~]# strings spfileASM_Copy2.ora
+ASM1.__oracle_base=’/u01/app/grid’#ORACLE_BASE set from in memory value
+ASM2.__oracle_base=’/u01/app/grid’#ORACLE_BASE set from in memory value
+ASM3.__oracle_base=’/u01/app/grid’#ORACLE_BASE set from in memory value
+ASM3.asm_diskgroups=’FRA’#Manual Mount
+ASM2.asm_diskgroups=’FRA’#Manual Mount
+ASM1.asm_diskgroups=’FRA’#Manual Mount
*.asm_power_limit=1
*.diagnostic_dest=’/u01/app/grid’
*.instance_type=’asm’
*.large_pool_size=12M
*.remote_login_passwordfile=’EXCLUSIVE’
Using the parameters in SPfile, ASM is started.
Once ASM is up, OCR is read by CRSD and various resources on the node are started.
Each node reads network information in
GPnP profile and using GNS, negotiates appropriate network identity
for itself . Hence, nodes can be dynamically added/deleted.
What happens if GPnP profile is lost?
To know please click here.
————————————————————
GPNPTOOL COMMAND REFERENCE:
- How to read the profile
[root@inssc3 bin]# ./gpnptool get
- How to find GPnP Deamons are running on the local node
[root@host01 peer]# gpnptool lfind
Success. Local gpnpd found.
- How to find the location of ASM spfile if the ASM is down
[root@host01 peer]# gpnptool getpval -asm_spf
+DATA/cluster01/asmparameterfile/registry.253.783619911
- How to find all RD-discoverable resources of given type
[root@host01 peer]# gpnptool find
Found 3 instances of service ‘gpnp’.
mdns:service:gpnp._tcp.local.://host03:18015/agent=gpnpd,cname=cluster01,host=host03,pid=5066/gpnpd h:host03 c:cluster01
mdns:service:gpnp._tcp.local.://host02:17637/agent=gpnpd,cname=cluster01,host=host02,pid=5236/gpnpd h:host02 c:cluster01
mdns:service:gpnp._tcp.local.://host01:16633/agent=gpnpd,cname=cluster01,host=host01,pid=5206/gpnpd h:host01 c:cluster01
References:
—————
Related links: