
Friday, August 12, 2016

gpnpd.bin or are not starting up --> GPnP deamon remains ins starting Mode

OHASD stack like gpnpd.bin or are not starting up (mdnsd.bin, gipcd.bin  use same method)
--> GPnP deamon  remains ins starting Mode
ora.gpnpd                      ONLINE     OFFLINE         STARTING       


Case 1 : Wrong protection for executable  $GRID_HOME/bin/gpnpd.bin
    ==> For add. details see GENERIC File Protection chapter   
Reported error in oraagent_grid.log         : [  clsdmc][1103787776]Fail to connect (ADDRESS=(PROTOCOL=ipc)
                                              (KEY=grac41DBG_GPNPD)) with status 9
                                              [ora.gpnpd][1103787776]{0:0:2} [start] Error = error 9 encountered 
                                              when connecting to GPNPD
Reported Clusterware Error in CW alert.log: [/u01/app/11204/grid/bin/oraagent.bin(20333)]
                                             CRS-5818:Aborted command 'start' for resource 'ora.gpnpd'. 
                                             Details at (:CRSAGF00113:) {0:0:2} in  ..... ohasd/oraagent_grid/oraagent_grid.log 
Testing scenario :
[grid@grac41 ~]$ chmod 444 $GRID_HOME/bin/gpnpd.bin
[grid@grac41 ~]$ ls -l $GRID_HOME/bin/gpnpd.bin
-r--r--r--. 1 grid oinstall 368780 Mar 19 17:07 /u01/app/11204/grid/bin/gpnpd.bin
[root@grac41 gpnp]# crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
Clusterware status :
$ crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager
$ crsi
NAME                           TARGET     STATE           SERVER       STATE_DETAILS   
-------------------------      ---------- ----------      ------------ ------------------
ora.asm                        ONLINE     OFFLINE                      Instance Shutdown 
ora.cluster_interconnect.haip  ONLINE     OFFLINE                        
ora.crf                        ONLINE     OFFLINE                        
ora.crsd                       ONLINE     OFFLINE                        
ora.cssd                       ONLINE     OFFLINE                        
ora.cssdmonitor                OFFLINE    OFFLINE                        
ora.ctssd                      ONLINE     OFFLINE                        
ora.diskmon                    OFFLINE    OFFLINE                        
ora.drivers.acfs               ONLINE     OFFLINE                        
ora.evmd                       ONLINE     OFFLINE                        
ora.gipcd                      ONLINE     OFFLINE                        
ora.gpnpd                      ONLINE     OFFLINE         STARTING       
ora.mdnsd                      ONLINE     ONLINE          grac41         
--> GPnP deamon  remains ins starting Mode
$ ps -elf | egrep "PID|d.bin|ohas" | grep -v grep
4 S root      6098     1  0  80   0 -  2846 pipe_w 04:44 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run
4 S root     20127     1 23  80   0 - 176890 futex_ 11:52 ?       00:00:51 /u01/app/11204/grid/bin/ohasd.bin reboot
4 S grid     20333     1  0  80   0 - 166464 futex_ 11:52 ?       00:00:02 /u01/app/11204/grid/bin/oraagent.bin
0 S grid     20344     1  0  80   0 - 74289 poll_s 11:52 ?        00:00:00 /u01/app/11204/grid/bin/mdnsd.bin
Review Tracefile : 
[/u01/app/11204/grid/bin/oraagent.bin(27632)]CRS-5818:Aborted command 'start' for resource 'ora.gpnpd'. 
       Details at (:CRSAGF00113:) {0:0:2} in  ... agent/ohasd/oraagent_grid/oraagent_grid.log.
2014-05-12 10:27:51.747: 
[ohasd(27477)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.gpnpd'. 
       Details at (:CRSPE00111:) {0:0:2} in /u01/app/11204/grid/log/grac41/ohasd/ohasd.log 
[  clsdmc][1103787776]Fail to connect (ADDRESS=(PROTOCOL=ipc)(KEY=grac41DBG_GPNPD)) with status 9
2014-05-12 10:27:17.476: [ora.gpnpd][1103787776]{0:0:2} [start] Error = error 9 encountered when connecting to GPNPD
2014-05-12 10:27:18.477: [ora.gpnpd][1103787776]{0:0:2} [start] without returnbuf
2014-05-12 10:27:18.659: [ COMMCRS][1125422848]clsc_connect: (0x7f3b300d92d0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=grac41DBG_GPNPD))
2014-05-12 10:27:51.745: [    AGFW][2693531392]{0:0:2} Received the reply to the message: RESOURCE_START[ora.gpnpd 1 1] ID 4098:362 from 
                                                       the agent /u01/app/11204/grid/bin/oraagent_grid
2014-05-12 10:27:51.745: [    AGFW][2693531392]{0:0:2} Agfw Proxy Server sending the reply to PE for message:
                                                       RESOURCE_START[ora.gpnpd 1 1] ID 4098:361
2014-05-12 10:27:51.747: [   CRSPE][2212488960]{0:0:2} Received reply to action [Start] message ID: 361
2014-05-12 10:27:51.747: [    INIT][2212488960]{0:0:2} {0:0:2} Created alert : (:CRSPE00111:) :  Start action timed out!
2014-05-12 10:27:51.747: [   CRSPE][2212488960]{0:0:2} Start action failed with error code: 3
2014-05-12 10:27:52.123: [    AGFW][2693531392]{0:0:2} Received the reply to the message: RESOURCE_START[ora.gpnpd 1 1] ID 4098:362 from the 
                                                       agent /u01/app/11204/grid/bin/oraagent_grid
2014-05-12 10:27:52.123: [    AGFW][2693531392]{0:0:2} Agfw Proxy Server sending the last reply to PE for message:
                                                       RESOURCE_START[ora.gpnpd 1 1] ID 4098:361
2014-05-12 10:27:52.123: [   CRSPE][2212488960]{0:0:2} Received reply to action [Start] message ID: 361
2014-05-12 10:27:52.123: [   CRSPE][2212488960]{0:0:2} RI [ora.gpnpd 1 1] new internal state: [STABLE] old value: [STARTING]
2014-05-12 10:27:52.123: [   CRSPE][2212488960]{0:0:2} CRS-2674: Start of 'ora.gpnpd' on 'grac41' failed
--> Here we see that are failing to start GPnP resource
Debugging steps :
Is process gpnpd.bin running ?
[root@grac41 ~]# ps  -elf  | egrep "PID|gpnpd" | grep -v grep
--> Missing process gpnpd.bin 
Restart CRS with strace support
# crsctl stop crs -f
# strace -t -f -o crs_startup.trc crsctl start crs
Check for EACESS erros and check return values of execve() and aceess() sytem calls :
[root@grac41 oracle]# grep EACCES crs_startup.trc
28301 12:19:46 access("/u01/app/11204/grid/bin/gpnpd.bin", X_OK) = -1 EACCES (Permission denied)
Review  crs_startup.trc more in detail 
27345 12:15:35 execve("/u01/app/11204/grid/bin/gpnpd.bin", ["/u01/app/11204/grid/bin/"...], [/* 73 vars */] <unfinished ...>
27238 12:15:35 <... lseek resumed> )    = 164864
27345 12:15:35 <... execve resumed> )   = -1 EACCES (Permission denied)
27345 12:15:35 access("/u01/app/11204/grid/bin/gpnpd.bin", X_OK <unfinished ...>
27238 12:15:35 <... read resumed> "\25\23\"\1\23\3\t\t\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 256) = 256
27345 12:15:35 <... access resumed> )   = -1 EACCES (Permission denied)
Verify problem with cluvfy
[grid@grac41 ~]$ cluvfy comp software -verbose 
Verifying software 
Check: Software
  Component: crs                      
  Node Name: grac41 
        Permissions of file "/u01/app/11204/grid/bin/gpnpd.bin" did not match the expected value. 
        [Expected = "0755" ; Found = "0444"]
    /u01/app/11204/grid/bin/gpnptool.bin..."Permissions" did not match reference

The GPnP profile is a small XML file located in GRID_HOME/gpnp/<hostname>/profiles/peer under the name profile.xml. It is used to establish the correct global personality of a node. Each node maintains a local copy of the GPnP Profile and is maintanied by the GPnP Deamon (GPnPD) .
GPnP Profile  is used to store necessary information required for the startup of Oracle Clusterware like  SPFILE location,ASM DiskString  etc.
It contains various attributes defining node personality.
- Cluster name
- Network classifications (Public/Private)
- Storage to be used for CSS
- Storage to be used for ASM : SPFILE location,ASM DiskString  etc 

- Digital signature information : The profile is security sensitive. It might identify the storage to be used as the root partition of a machine.  Hence, it contains digital signature information of the provisioning authority.
Here is the GPnP profile of my RAC setup.
gpnptool can be  used  for reading/editing the gpnp profile.
[root@host01 peer]# gpnptool get
Next CRSD needs to read OCR to startup various resources on the node and hence update it as status of resources changes. Since OCR is also on ASM, location of ASM SPfile should be known.
The order of searching the ASM SPfile is
  • - GPnP profile
  • - ORACLE_HOME/dbs/spfile<sid.ora>
  • - ORACLE_HOME/dbs/init<sid.ora>
In cluster environment, the location of  SPfile for ASMread from GPnP profile.
[grid@host01 peer]$ gpnptool getpval -asm_spf
Warning: some command line parameters were defaulted. Resulting command line:
         /u01/app/11.2.0/grid/bin/gpnptool.bin getpval -asm_spf -p=profile.xml -o-
The oputput of the query shows that SPfile is on ASM in DATA diskgroup. To find out the
 location of ASM disks, following query is issued :
[root@host01 peer]# gpnptool getpval -asm_dis
ASM-Profile id=”asm” DiscoveryString=””
The  device headers of every device in the disk string returned by the above query are scanned  (if configured by you at ASM initial setup time). Here Discovery String is blank is as ASMDISKSTRINGS parameter has not been set. Hence, headers of all the ASM disks are scanned .
Here, I have shown the output of the query only on the disk which contains SPfile.(spfflg is not null)
[root@host01 ~]#  kfed read /dev/sdb3 | grep -E ‘spf|ausize’
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.spfile:                       16 ; 0x0f4: 0x00000010
kfdhdb.spfflg:                        1 ; 0x0f8: 0x00000001
In the output above, we see that
     the device /dev/sdb3 contains a copy of the ASM spfile (spfflg=1).
     The ASM spfile location starts at the disk offset of 16 (spfile=16)
Considering the allocation unit size (kfdhdb.ausize = 1M), let’s dump the ASM spfile from the device:
[root@host01 ~]#  dd if=/dev/sdb3 of=spfileASM_Copy2.ora skip=16  bs=1M count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.170611 seconds, 6.1 MB/s
[root@host01 ~]# strings spfileASM_Copy2.ora
+ASM1.__oracle_base=’/u01/app/grid’#ORACLE_BASE set from in memory value
+ASM2.__oracle_base=’/u01/app/grid’#ORACLE_BASE set from in memory value
+ASM3.__oracle_base=’/u01/app/grid’#ORACLE_BASE set from in memory value
+ASM3.asm_diskgroups=’FRA’#Manual Mount
+ASM2.asm_diskgroups=’FRA’#Manual Mount
+ASM1.asm_diskgroups=’FRA’#Manual Mount
Using the parameters in SPfile, ASM is started.
Once ASM is up, OCR is read by CRSD and various resources on the node are started.
Each node reads network information in GPnP profile and using GNS,  negotiates appropriate network identity for itself . Hence, nodes can be dynamically added/deleted.
What happens if GPnP profile is lost?
To know please click  here.

- How to read the profile
[root@inssc3 bin]# ./gpnptool get
- How to find GPnP Deamons are running on the local node
[root@host01 peer]# gpnptool lfind
Success. Local gpnpd found.
- How to find the location of ASM spfile if the ASM is down
[root@host01 peer]# gpnptool getpval -asm_spf
- How to find all RD-discoverable resources of given type
[root@host01 peer]# gpnptool find
Found 3 instances of service ‘gpnp’.
        mdns:service:gpnp._tcp.local.://host03:18015/agent=gpnpd,cname=cluster01,host=host03,pid=5066/gpnpd h:host03 c:cluster01
        mdns:service:gpnp._tcp.local.://host02:17637/agent=gpnpd,cname=cluster01,host=host02,pid=5236/gpnpd h:host02 c:cluster01
        mdns:service:gpnp._tcp.local.://host01:16633/agent=gpnpd,cname=cluster01,host=host01,pid=5206/gpnpd h:host01 c:cluster01
Related links: