1. Useful "OS command" and "ClusterWare
commands", "GREP commands " ,
"OS logfile location" & "Clusterware
logfile" location details
2. Generic troubleshooting hints : How to review
ClusterWare trace files
3, GENERIC Networking troubleshooting
4, Debugging complete ClusterWare startup with strace
5, Debugging a single ClusterWare process with strace
1, Useful OS and CW commands, GREP commands , OS logfile
location and Clusterware logfile location details
1 : Clusterware logfile structure
CW
Alert.log alert<hostname>.log ( most important one !! )
OHASD
ohsad.log
CSSD
ocssd.log
EVMD
evmd.log
CRSD
crsd.log
MDSND
mdnsd.log
GIPCD
gipcd.log
GPNPD
gpnpd.log
Agent
directories
agent/ohasd
agent/ohasd/oraagent_grid
agent/ohasd/oracssdagent_root
agent/ohasd/oracssdmonitor_root
agent/ohasd/orarootagent_root
2 : OS System logs
HPUX /var/adm/syslog/syslog.log
AIX /bin/errpt–a
Linux /var/log/messages
Windows Refer .TXT log files under Application/System log
using Windows Event Viewer
Solaris /var/adm/messages
Linux Sample
# grep 'May 20' ./grac41/var/log/messages > SYSLOG
--> Check SYSLOG for relvant errors
An typical CW error could look like:
# cat /var/log/messages
May 13 13:48:27 grac41 OHASD[22203]: OHASD exiting; Directory
/u01/app/11204/grid/log/grac41/ohasd not found
3 : Usefull Commands for a quick check of clusterware
status
It
may be usefull to run all commands below just to get an idea what is working
and what is not working
3.1 : OS commands ( assume we have CW startup problems on grac41
)
#
ping grac41
#
route -n
#
/bin/netstat -in
#
/sbin/ifconfig -a
#
/bin/ping -s <MTU> -c 2 -I source_IP nodename
#
/bin/traceroute -s source_IP -r -F nodename-priv <MTU-28>
#
/usr/bin/nslookup grac41
3.2 : Clusterware commands to debug startup problems
Check
Clusterware status
#
crsctl check crs
#
crsctl check css
#
crsctl check evm
#
crsctl stat res -t -init
If
OHASD stack is completly up and running you can check your cluster resources
with
#
crsctl stat res -t
3.3 : Checking OLR to debug startup problems
#
ocrcheck -local
#
ocrcheck -local -config
3.4 : Checking OCR/Votedisks to debug startup problems
$
crsctl query css votedisk
Next
2 commands will only work when startup problems are fixed
$
ocrcheck
$
ocrcheck -config
3.5 : Checking GPnP to debug startup problems
#
$GRID_HOME/bin/gpnptool get
For
futher debugging
#
$GRID_HOME/bin/gpnptool lfind
#
$GRID_HOME/bin/gpnptool getpval -asm_spf
-p=/u01/app/11204/grid/gpnp/profiles/peer/profile.xml
#
$GRID_HOME/bin/gpnptool check
-p=/u01/app/11204/grid/gpnp/profiles/peer/profile.xml
#
$GRID_HOME/bin/gpnptool verify
-p=/u01/app/11204/grid/gpnp/profiles/peer/profile.xml
-w="/u01/app/11204/grid/gpnp/grac41/wallets/peer" -wu=peer
3.6 : Cluvfy commands to debug startup problems
Network problems:
$
cluvfy comp nodereach -n grac41 -vebose
Identify
your interfaces used for public and private usage and check related networks
$
cluvfy comp nodecon -n grac41,grac42 -i eth1 -verbose (
public Interface )
$
cluvfy comp nodecon -n grac41,grac42 -i eth2 -verbose (
private Interface )
$
cluvfy comp nodecon -n grac41 -verbose
Testing
multicast communication for multicast group "230.0.1.0" .
$
cluvfy stage -post hwos -n grac42
Cluvfy commands to verify ASM DG and Voting disk location
Note: Run cluvfy from a working Node ( grac42 ) to get more details
[grid@grac42
~]$ cluvfy comp vdisk -n grac41
ERROR: PRVF-5157 : Could not verify ASM group
"OCR" for Voting Disk location "/dev/asmdisk1_udev_sdh1"
--> From the error code we know ASM disk group + Voting Disk location
$
cluvfy comp olr -verbose
$
cluvfy comp software -verbose
$
cluvfy comp ocr -n grac42,grac41
$
cluvfy comp sys -n grac41 -p crs -verbose
Comp healthcheck is quite helpfull
to get an overview but as OHASD is not running most of the
errors
are related to the CW startup problem.
$
cluvfy comp healthcheck -collect cluster -html
$
firefox cvucheckreport_523201416347.html
4 : Useful grep Commands
GPnP profile is not accessible - gpnpd needs to be fully up to
serve profile
$ fn_egrep.sh "Cannot get GPnP profile|Error
put-profile CALL"
TraceFileName:
./grac41/agent/ohasd/orarootagent_root/orarootagent_root.log
2014-05-20
10:26:44.532: [ default][1199552256]Cannot get GPnP profile. Error
CLSGPNP_NO_DAEMON (GPNPD daemon is not running).
Cannot
get GPnP profile
2014-04-21
15:27:06.838: [ GPNP][132114176]clsgpnp_profileCallUrlInt:
[at clsgpnp.c:2243] Result: (13) CLSGPNP_NO_DAEMON.
Error
put-profile CALL to remote "tcp://grac41:56376" disco
"mdns:service:gpnp._tcp.local.://grac41:56376/agent=gpnpd,cname=grac4,host=grac41,pid=4548/gpnpd
h:grac41 c:grac4"
Network socket file doesn't have appropriate ownership or
permission
#
fn_egrep.sh "clsclisten: Permission
denied"
[
COMMCRS][3534915328]clsclisten: Permission denied for
(ADDRESS=(PROTOCOL=ipc)(KEY=grac41DBG_MDNSD))
Problems with Private Interconnect
$
fn.sh "2014-06-03" | egrep 'but no
network HB|TraceFileName'
Search
String: no network HB
TraceFileName:
./cssd/ocssd.log
2014-06-02
12:51:52.564: [ CSSD][2682775296]clssnmvDHBValidateNcopy:
node 3, grac43, has a disk HB, but no network HB , ..
or
$ fn_egrep.sh "failed to
resolve|gipcretFail|gipcretConnectionRefused" | egrep
'TraceFile|2014-05-20 11:0'
TraceFileName:
./grac41/crsd/crsd.log and ./grac41/evmd/evmd.log may report
2014-05-20
11:04:02.563: [GIPCXCPT][154781440] gipchaInternalResolve: failed to resolve
ret gipcretKeyNotFound (36), host 'grac41', port 'ffac-854b-c525-6f9c', hctx
0x2ed3940 [0000000000000010] { gipchaContext : host 'grac41', name
'd541-9a1e-7807-8f4a', luid 'f733b93a-00000000', numNode 0, numInf 1, usrFlags
0x0, flags 0x5 }, ret gipcretKeyNotFound (36)
2014-05-20
11:04:02.563: [GIPCHGEN][154781440] gipchaResolveF [gipcmodGipcResolve :
gipcmodGipc.c : 806]: EXCEPTION[ ret gipcretKeyNotFound (36) ] failed to
resolve ctx 0x2ed3940 [0000000000000010] { gipchaContext : host 'grac41', name
'd541-9a1e-7807-8f4a', luid 'f733b93a-00000000', numNode 0, numInf 1, usrFlags
0x0, flags 0x5 }, host 'grac41', port 'ffac-854b-c525-6f9c', flags 0x0
Is there a valid CI network device ?
#
fn_egrep.sh "NETDATA" | egrep 'TraceFile|2014-06-03'
TraceFileName:
./gipcd/gipcd.log
2014-06-03
07:48:45.401: [ CLSINET][3977414400] Returning NETDATA: 1 interfaces <-- ok
2014-06-03
07:52:51.589: [ CLSINET][1140848384] Returning NETDATA: 0 interfaces <-- problems !
Are Voting Disks acessible ?
$
fn_egrep.sh "Successful discovery"
TraceFileName:
./grac41/cssd/ocssd.log
2014-05-22
13:41:38.776: [ CSSD][1839290112]clssnmvDiskVerify:
Successful discovery of 0 disks
1 : Limit trace files size and file count by using TFA
command: tfactl diagcollect
Note
only single node collection is neccessary for CW startup problem - here node
grac41 has CW startup problems
#
tfactl diagcollect -node grac41 -from
"May/20/2014 06:00:00" -to "May/20/2014 15:00:00"
--> Scanning files from May/20/2014 06:00:00 to May/20/2014
15:00:00
...
Logs
are collected to:
/u01/app/grid/tfa/repository/collection_Wed_May_21_09_19_10_CEST_2014_node_grac41/grac41.tfa_Wed_May_21_09_19_10_CEST_2014.zip
Extract zip file and scan for various Clusterware errors
#
mkdir /u01/TFA
#
cd /u01/TFA
#
unzip
/u01/app/grid/tfa/repository/collection_Wed_May_21_09_19_10_CEST_2014_node_grac41/grac41.tfa_Wed_May_21_09_19_10_CEST_2014.zip
Locate important files in our unzipped TFA repository
# pwd
/u01/TFA/
#
find . -name "alert*"
./grac41/u01/app/11204/grid/log/grac41/alertgrac41.log
./grac41/asm/+asm/+ASM1/trace/alert_+ASM1.log
./grac41/rdbms/grac4/grac41/trace/alert_grac41.log
#
find . -name "mess*"
./grac41/var/log/messages
./grac41/var/log/messages-20140504
2
: Review Clusterware alert.log errors
# get_ca.sh alertgrac41.log 2014-05-23
->
File searched: alertgrac41.log
->
Start search timestamp : 2014-05-23
->
End search timestamp :
Begin:
CNT 0 - TS --
2014-05-23
15:29:31.297: [mdnsd(16387)]CRS-5602:mDNS service stopping by request.
2014-05-23
15:29:34.211: [gpnpd(16398)]CRS-2329:GPNPD on node grac41 shutdown.
2014-05-23
15:29:45.785: [ohasd(2736)]CRS-2112:The OLR service started on node
grac41.
2014-05-23
15:29:45.845: [ohasd(2736)]CRS-1301:Oracle High Availability Service
started on node grac41.
2014-05-23
15:29:45.861: [ohasd(2736)]CRS-8017:location: /etc/oracle/lastgasp has 2
reboot advisory log files, 0 were announced and 0 errors occurred
2014-05-23
15:29:49.999: [/u01/app/11204/grid/bin/orarootagent.bin(2798)]CRS-2302:Cannot
get GPnP profile.
Error
CLSGPNP_NO_DAEMON (GPNPD daemon is not running).
2014-05-23
15:29:55.075: [ohasd(2736)]CRS-2302:Cannot get GPnP profile. Error
CLSGPNP_NO_DAEMON (GPNPD daemon is not running).
2014-05-23
15:29:55.081: [gpnpd(2934)]CRS-2328:GPNPD started on node grac41.
2014-05-23
15:29:57.576: [cssd(3040)]CRS-1713:CSSD daemon is started in clustered
mode
2014-05-23
15:29:59.331: [ohasd(2736)]CRS-2767:Resource state recovery not attempted
for 'ora.diskmon' as its target state is OFFLINE
2014-05-23
15:29:59.331: [ohasd(2736)]CRS-2769:Unable to failover resource
'ora.diskmon'.
2014-05-23
15:30:01.905: [cssd(3040)]CRS-1714:Unable to discover any voting files,
retrying discovery in 15 seconds;
Details
at (:CSSNM00070:) in /u01/app/11204/grid/log/grac41/cssd/ocssd.log
2014-05-23
15:30:16.945: [cssd(3040)]CRS-1714:Unable to discover any voting files,
retrying discovery in 15 seconds;
Details
at (:CSSNM00070:) in /u01/app/11204/grid/log/grac41/cssd/ocssd.log
-->
Script get_ca.sh adds a timestamp and reduces output and only dumps
errors for certain day
In
the above sample we can easily pin-point the problem to a voting disks issue
If
you cant find a obvious reason you still need to review your clusterware
alert.log
3
: Review Clusterware logfiles for errors and
resources failing or not starting up
$
fn.sh 2014-05-25 | egrep
'TraceFileName|CRS-|ORA-|TNS-|LFI-|KFNDG-|KFED-|KFOD-|CLSDNSSD-|CLSGN-|CLSMDNS-|CLS.-
|NDFN-|EVM-|GIPC-|PRK.-|PRV.-|PRC.-|PRIF-|SCLS-|PROC-|PROCL-|PROT-|PROTL-'
TraceFileName: ./crsd/crsd.log
2014-05-25
11:29:04.968: [ CRSPE][1872742144]{1:62631:2} CRS-2672: Attempting
to start 'ora.SSD.dg' on 'grac41'
2014-05-25
11:29:04.999: [ CRSPE][1872742144]{1:62631:2} CRS-2672: Attempting
to start 'ora.DATA.dg' on 'grac41'
2014-05-25
11:29:05.063: [ CRSPE][1872742144]{1:62631:2} CRS-2672: Attempting
to start 'ora.FRA.dg' on 'grac41
....
If a certain resource like ora.net1.network doesn't start - grep
for details using resource name
[grid@grac42
grac42]$ fn.sh "ora.net1.network" |
egrep '2014-05-31 11|TraceFileName'
TraceFileName:
./agent/crsd/orarootagent_root/orarootagent_root.log
2014-05-31
11:58:27.899: [ AGFW][1829066496]{2:12808:5} Agent received
the message: RESOURCE_START[ora.net1.network grac42 1] ID 4098:403
2014-05-31
11:58:27.899: [ AGFW][1829066496]{2:12808:5} Preparing START
command for: ora.net1.network grac42 1
2014-05-31
11:58:27.899: [ AGFW][1829066496]{2:12808:5} ora.net1.network
grac42 1 state changed from: OFFLINE to: STARTING
2014-05-31
11:58:27.917: [ AGFW][1826965248]{2:12808:5} Command: start
for resource: ora.net1.network grac42 1 completed with status: SUCCESS
2014-05-31
11:58:27.919: [ AGFW][1829066496]{2:12808:5} Agent sending
reply for: RESOURCE_START[ora.net1.network grac42 1] ID 4098:403
2014-05-31
11:58:27.969: [ AGFW][1829066496]{2:12808:5} ora.net1.network
grac42 1 state changed from: STARTING to: UNKNOWN
4
: Review ASM alert.log
5
: Track generation of new Tints and follow these
Tints
$ fn.sh 2014-05-25 | egrep 'TraceFileName|Generating new
Tint'
TraceFileName:
./agent/ohasd/orarootagent_root/orarootagent_root.log
2014-05-25
13:52:07.862: [ AGFW][2550134528]{0:11:3} Generating new Tint
for unplanned state change. Original Tint: {0:0:2}
2014-05-25
13:52:36.126: [ AGFW][2550134528]{0:11:6} Generating new Tint
for unplanned state change. Original Tint: {0:0:2}
[grid@grac41
grac41]$ fn.sh "{0:11:3}" | more
Search
String: {0:11:3}
TraceFileName:
./alertgrac41.log
[/u01/app/11204/grid/bin/cssdmonitor(1833)]CRS-5822:Agent
'/u01/app/11204/grid/bin/cssdmonitor_root' disconnected from server. Details at
(:CRSAGF00117:) {0:11:3}
in
/u01/app/11204/grid/log/grac41/agent/ohasd/oracssdmonitor_root/oracssdmonitor_root.log.
--------------------------------------------------------------------
TraceFileName:
./agent/ohasd/oracssdmonitor_root/oracssdmonitor_root.log
2014-05-19
14:34:03.312: [ AGENT][3481253632]{0:11:3} {0:11:3} Created alert :
(:CRSAGF00117:) :
Disconnected from
server, Agent is shutting down.
2014-05-19
14:34:03.312: [ USRTHRD][3481253632]{0:11:3} clsncssd_exit: CSSD Agent was
asked to exit with exit code 2
2014-05-19
14:34:03.312: [ USRTHRD][3481253632]{0:11:3} clsncssd_exit: No connection with
CSS, exiting.
6
: Investigate GENERIC File Protection
problems for Log File Location, Ownership and Permissions
==> For further details see GENERIC File Protection
troubleshooting chapter
7
: Investigate GENERIC Networking problems
==> For further details see GENERIC Networking
troubleshooting chapter
8
: Check logs for a certain resource for a
certain time ( in this sample ora.gpnpd resource was used)
[root@grac41
grac41]# fn.sh "ora.gpnpd" |
egrep "TraceFileName|2014-05-22 07" | more
TraceFileName:
./agent/ohasd/oraagent_grid/oraagent_grid.log
2014-05-22
07:18:27.281: [ora.gpnpd][3696797440]{0:0:2} [check] clsdmc_respget return:
status=0, ecode=0
2014-05-22
07:18:57.291: [ora.gpnpd][3698898688]{0:0:2} [check] clsdmc_respget return:
status=0, ecode=0
2014-05-22
07:19:27.290: [ora.gpnpd][3698898688]{0:0:2} [check] clsdmc_respget return:
status=0, ecode=0
9
: Check related trace depending on startup
dependencies
GPndp
daemon has the following dependencies: OHASD (root) starts -->
OHASD ORAgent (grid) starts --> GPnP (grid)
Following
tracefile needs to reviewed first for add. info
./grac41/u01/app/11204/grid/log/grac41/gpnpd/gpnpd.log
( see above )
./grac41/u01/app/11204/grid/log/grac41/ohasd/ohasd.log
./grac41/u01/app/11204/grid/log/grac41/agent/ohasd/oraagent_grid/oraagent_grid.log
10
: Check for tracefiles updated very frequently (
this helps to identify looping processes )
# date ; find . -type f -printf "%CY-%Cm-%Cd
%CH:%CM:%CS %h/%f\n" | sort -n | tail -5
Thu
May 22 07:52:45 CEST 2014
2014-05-22
07:52:12.1781722420 ./grac41/alertgrac41.log
2014-05-22
07:52:44.8401175210 ./grac41/agent/ohasd/oraagent_grid/oraagent_grid.log
2014-05-22
07:52:45.2701299670 ./grac41/client/crsctl_grid.log
2014-05-22
07:52:45.2901305450 ./grac41/ohasd/ohasd.log
2014-05-22
07:52:45.3221314710 ./grac41/client/olsnodes.log
# date ; find . -type f -printf "%CY-%Cm-%Cd
%CH:%CM:%CS %h/%f\n" | sort -n | tail -5
Thu
May 22 07:52:48 CEST 2014
2014-05-22
07:52:12.1781722420 ./grac41/alertgrac41.log
2014-05-22
07:52:47.3701907460 ./grac41/client/crsctl_grid.log
2014-05-22
07:52:47.3901913240 ./grac41/ohasd/ohasd.log
2014-05-22
07:52:47.4241923080 ./grac41/client/olsnodes.log
2014-05-22
07:52:48.3812200070 ./grac41/agent/ohasd/oraagent_grid/oraagent_grid.log
Check these trace files with tail -f
# tail -f ./grac41/agent/ohasd/oraagent_grid/oraagent_grid.log
[
clsdmc][1975785216]Fail to connect
(ADDRESS=(PROTOCOL=ipc)(KEY=grac41DBG_GPNPD)) with status 9
2014-05-22
07:56:11.198: [ora.gpnpd][1975785216]{0:0:2} [start] Error = error 9
encountered when connecting to GPNPD
2014-05-22
07:56:12.199: [ora.gpnpd][1975785216]{0:0:2} [start] without returnbuf
2014-05-22
07:56:12.382: [ COMMCRS][1966565120]clsc_connect: (0x7fd9500cc070) no listener
at (ADDRESS=(PROTOCOL=ipc)(KEY=grac41DBG_GPNPD))
[
clsdmc][1975785216]Fail to connect
(ADDRESS=(PROTOCOL=ipc)(KEY=grac41DBG_GPNPD)) with status 9
2014-05-22
07:56:12.382: [ora.gpnpd][1975785216]{0:0:2} [start] Error = error 9
encountered when connecting to GPNPD
2014-05-22
07:56:13.382: [ora.gpnpd][1975785216]{0:0:2} [start] without returnbuf
2014-05-22
07:56:13.553: [ COMMCRS][1966565120]clsc_connect: (0x7fd9500cc070) no listener
at (ADDRESS=(PROTOCOL=ipc)(KEY=grac41DBG_GPNPD))
--> Problem is we can't connect to the GPNPD listener
#
ps -elf | egrep "PID|d.bin|ohas" | grep -v grep
F
S UID PID PPID C
PRI NI ADDR SZ WCHAN STIME
TTY TIME CMD
4
S root 1560 1 0
-40 - - 160927 futex_ 07:56 ?
00:00:00 /u01/app/11204/grid/bin/cssdmonitor
4
S root 4494 1
0 80 0 - 2846 pipe_w May21
? 00:00:00 /bin/sh
/etc/init.d/init.ohasd run
4
S root 30441 1 23
80 0 - 176982 futex_ 07:49 ?
00:02:04 /u01/app/11204/grid/bin/ohasd.bin reboot
4
S grid 30612 1 0 80
0 - 167327 futex_ 07:50 ? 00:00:05
/u01/app/11204/grid/bin/oraagent.bin
0
S grid 30623 1 0
80 0 - 74289 poll_s 07:50
? 00:00:00
/u01/app/11204/grid/bin/mdnsd.bin
-->
Indeed gpnpd is not running . Investiage this
further by debugging CW with strace
11
: If still no root cause was found try to
grep all message for that period and review the output carefully
#
fn.sh "2014-05-20 07:2" |
more
Search
String: 2014-05-20 07:2
...
--------------------------------------------------------------------
TraceFileName:
./grac41/u01/app/11204/grid/log/grac41/agent/ohasd/oraagent_grid/oraagent_grid.l01
2014-05-20
07:23:04.338: [ AGFW][374716160]{0:0:2} Agent received the
message: AGENT_HB[Engine] ID 12293:1236
....
--------------------------------------------------------------------
TraceFileName:
./grac41/u01/app/11204/grid/log/grac41/gpnpd/gpnpd.log
2014-05-20
07:23:13.385: [ default][4133218080]
2014-05-20
07:23:13.385: [ default][4133218080]gpnpd START pid=16641 Oracle Grid Plug-and-Play
Daemon
- Resource reports status STARTING for a long time before failing with CRS errors
- After some time resource becomes OFFLINE
Debug startup problems for
GPnP daemon
Case #1 : Check that GPnP daemon can write to trace file
location and new timestamps are written
Following directory/files needs to have proper protections :
Trace directory : ./log/grac41/gpnpd
STDOUT log file : ./log/grac41/gpnpd/gpnpdOUT.log
Error log file : ./log/grac41/gpnpd/gpnpd.log
If gpnpdOUT.log and gpnpd.log are not updated when starting GPnP daemon
you need to review your file protections
Sample for GPnP resource:
# ls -ld ./grac41/u01/app/11204/grid/log/grac41/gpnpd
drwxr-xr-x. 2 root root 4096 May 21 09:52
./grac41/u01/app/11204/grid/log/grac41/gpnpd
# ls -l ./grac41/u01/app/11204/grid/log/grac41/gpnpd
-rw-r--r--. 1 root root 420013 May 21 09:35 gpnpd.log
-rw-r--r--. 1 root root 26567 May 21 09:31 gpnpdOUT.log
==> Here we can see that trace files are owned
by root which is wrong !
After changing directory and tracefils with chown grid:oinstall ... traces
where sucessfully written
If unsure about protection verify this with a cluster node where CRS is
up and running
# ls -ld /u01/app/11204/grid/log/grac41/gpnpd
drwxr-x---. 2 grid oinstall 4096 May 20 13:53
/u01/app/11204/grid/log/grac41/gpnpd
# ls -l /u01/app/11204/grid/log/grac41/gpnpd
-rw-r--r--. 1 grid oinstall 122217 May 19 13:35 gpnpd_1.log
-rw-r--r--. 1 grid oinstall 1747836 May 20 12:32 gpnpd.log
-rw-r--r--. 1 grid oinstall 26567 May 20 12:31 gpnpdOUT.log
Case #2 : Check that IPC sockets have proper protection
(this info is not available via tfa collector )
( verify this with a node where CRS is up and running )
# ls -l /var/tmp/.oracle |
grep -i gpnp
srwxrwxrwx. 1 grid oinstall 0 May 20 12:31
ora_gipc_GPNPD_grac41
-rw-r--r--. 1 grid
oinstall 0 May 20 10:11 ora_gipc_GPNPD_grac41_lock
srwxrwxrwx. 1 grid
oinstall 0 May 20 12:31 sgrac41DBG_GPNPD
==> Again check this against a working cluster
node
You may
consider to compare all IPC socket info available in /var/tmp/.oracle
Same sort of debugging can be used for
other processes like MDNSD daemon
MDNSD process can't listen to IPC socket grac41DBG_MDNSD socket and terminates
Grep command:
$ fn_egrep.sh "clsclisten: Permission
denied"
TraceFileName: ./grac41/mdnsd/mdnsd.log
2014-05-19 08:08:53.097: [ COMMCRS][2179102464]clsclisten: Permission denied for
(ADDRESS=(PROTOCOL=ipc)(KEY=grac41DBG_MDNSD))
2014-05-19 08:10:58.177: [ COMMCRS][3534915328]clsclisten:
Permission denied for (ADDRESS=(PROTOCOL=ipc)(KEY=grac41DBG_MDNSD))
Trace file extract: ./grac41/mdnsd/mdnsd.log
2014-05-19 08:08:53.087: [ default][2187654912]mdnsd START pid=11855
2014-05-19 08:08:53.097: [ COMMCRS][2179102464]clsclisten: Permission
denied for (ADDRESS=(PROTOCOL=ipc)(KEY=grac41DBG_MDNSD))
--> Permissiong problems for MDNSD resource
2014-05-19 08:08:53.097: [ clsdmt][2181203712]Fail to listen to
(ADDRESS=(PROTOCOL=ipc)(KEY=grac41DBG_MDNSD))
2014-05-19 08:08:53.097: [ clsdmt][2181203712]Terminating process
2014-05-19 08:08:53.097: [ MDNS][2181203712] clsdm
requested mdnsd exit
2014-05-19 08:08:53.097: [ MDNS][2181203712] mdnsd
exit
2014-05-19 08:10:58.168: [ default][3543467776]
Case #3 : Check gpnpd.log for sucessful write of the
related PID file
# egrep "PID for the Process|Creating PID|Writing PID"
./grac41/u01/app/11204/grid/log/grac41/gpnpd/gpnpd.log
2014-05-20 07:23:13.417: [ clsdmt][4121581312]PID for the Process
[16641], connkey 10
2014-05-20 07:23:13.417: [ clsdmt][4121581312]Creating PID [16641] file
for home /u01/app/11204/grid host grac41 bin gpnp to
/u01/app/11204/grid/gpnp/init/
2014-05-20 07:23:13.417: [ clsdmt][4121581312]Writing PID [16641] to the
file [/u01/app/11204/grid/gpnp/init/grac41.pid]
Case #4 : Check gpnpd.log file for fatal errors like
PROCL-5 PROCL-26
# less ./grac41/u01/app/11204/grid/log/grac41/gpnpd/gpnpd.log
2014-05-20 07:23:14.377: [
GPNP][4133218080]clsgpnpd_openLocalProfile: [at clsgpnpd.c:3477] Got local
profile from file cache provider (LCP-FS).
2014-05-20 07:23:14.380: [
GPNP][4133218080]clsgpnpd_openLocalProfile: [at clsgpnpd.c:3532] Got local
profile from OLR cache provider (LCP-OLR).
2014-05-20 07:23:14.385: [
GPNP][4133218080]procr_open_key_ext: OLR api procr_open_key_ext failed for key
SYSTEM.GPnP.profiles.peer.pending
2014-05-20 07:23:14.386: [
GPNP][4133218080]procr_open_key_ext: OLR current boot level : 7
2014-05-20 07:23:14.386: [
GPNP][4133218080]procr_open_key_ext: OLR error code : 5
2014-05-20 07:23:14.386: [
GPNP][4133218080]procr_open_key_ext: OLR error message : PROCL-5: User does not have permission to perform a local
registry operation on this key. Authentication error [User does not have
permission to perform this operation] [0]
2014-05-20 07:23:14.386: [
GPNP][4133218080]clsgpnpco_ocr2profile: [at clsgpnpco.c:578] Result: (58) CLSGPNP_OCR_ERR. Failed to open requested OLR Profile.
2014-05-20 07:23:14.386: [ GPNP][4133218080]clsgpnpd_lOpen:
[at clsgpnpd.c:1734] Listening on ipc://GPNPD_grac41
2014-05-20 07:23:14.386: [ GPNP][4133218080]clsgpnpd_lOpen:
[at clsgpnpd.c:1743] GIPC gipcretFail (1) gipcListen
listen failure on
2014-05-20 07:23:14.386: [ default][4133218080]GPNPD failed to start listening
for GPnP peers.
2014-05-20 07:23:14.388: [ GPNP][4133218080]clsgpnpd_term:
[at clsgpnpd.c:1344] STOP GPnPD terminating. Closing connections...
2014-05-20 07:23:14.400: [ default][4133218080]clsgpnpd_term STOP terminating.
- Private IP address is not directly used by clusterware
- If changing IP from 192.168.2.102 to 192.168.2.108 CW still comes up as network address 192.168.2.0 does not change
- If changing IP from 192.168.2.102 to 192.168.3.103 CW doesn’t come up as network address changed from 192.168.2.0 to 192.168.3.0 – Network address is used by GPnP profile.xml for private and public network
- Check crfmond/crfmond.trc trace file for private network errors ( usefull for CI errors )
- If you get any GIPC error message always think on a real network problems first
Case #1 : Nameserver not running/available
Reported
error in
evmd.log
: [ OCRMSG][3360876320]GIPC error
[29] msg [gipcretConnectionRefused
Reported
Clusterware Error in CW alert.log:
CRS-5011:Check of resource "+ASM" failed:
Testing scenario :
Stop nameserver and restart CRS
#
crsctl start crs
CRS-4123:
Oracle High Availability Services has been started.
Clusterware status
[root@grac41
gpnpd]# crsctl check crs
CRS-4638: Oracle High Availability Services
is online
CRS-4535:
Cannot communicate with Cluster Ready Services
CRS-4529:
Cluster Synchronization Services is online
CRS-4534:
Cannot communicate with Event Manager
--> CSS, HAS are ONLINE - EVM and CRS are OFFLINE
#
ps -elf | egrep "PID|d.bin|ohas|oraagent|orarootagent|cssdagent|cssdmonitor"
| grep -v grep
F
S UID PID PPID C
PRI NI ADDR SZ WCHAN STIME
TTY TIME CMD
4
S root 5396 1
0 80 0 - 2847 pipe_w 10:52
? 00:00:00 /bin/sh
/etc/init.d/init.ohasd run
4
S root 9526 1
3 80 0 - 178980 futex_ 14:47
? 00:00:08
/u01/app/11204/grid/bin/ohasd.bin reboot
4
S grid 9705 1
0 80 0 - 174922 futex_ 14:47
? 00:00:00
/u01/app/11204/grid/bin/oraagent.bin
0
S grid 9716 1
0 80 0 - 74289 poll_s 14:47
? 00:00:00
/u01/app/11204/grid/bin/mdnsd.bin
0
S grid 9749 1
1 80 0 - 127375 hrtime 14:47
? 00:00:02
/u01/app/11204/grid/bin/gpnpd.bin
0
S grid 9796 1
1 80 0 - 159711 hrtime 14:47
? 00:00:04
/u01/app/11204/grid/bin/gipcd.bin
4
S root 9799 1
1 80 0 - 168656 futex_ 14:47
? 00:00:03
/u01/app/11204/grid/bin/orarootagent.bin
4
S root 9812 1 3
-40 - - 160908 hrtime 14:47 ?
00:00:08 /u01/app/11204/grid/bin/osysmond.bin
4
S root 9823 1 0
-40 - - 162793 futex_ 14:47 ?
00:00:00 /u01/app/11204/grid/bin/cssdmonitor
4
S root 9842 1 0
-40 - - 162920 futex_ 14:47 ?
00:00:00 /u01/app/11204/grid/bin/cssdagent
4
S grid 9855 1 2 -40
- - 166594 futex_ 14:47 ? 00:00:04
/u01/app/11204/grid/bin/ocssd.bin
4
S root 10884 1 1
80 0 - 159388 futex_ 14:48 ?
00:00:02 /u01/app/11204/grid/bin/octssd.bin reboot
0
S grid 10904 1 0
80 0 - 75285 hrtime 14:48 ?
00:00:00 /u01/app/11204/grid/bin/evmd.bin
$
crsi
NAME
TARGET
STATE
SERVER STATE_DETAILS
-------------------------
---------- ---------- ------------
------------------
ora.asm
ONLINE
OFFLINE
CLEANING
ora.cluster_interconnect.haip
ONLINE
ONLINE
grac41
ora.crf
ONLINE
ONLINE grac41
ora.crsd
ONLINE OFFLINE
ora.cssd
ONLINE
ONLINE grac41
ora.cssdmonitor
ONLINE
ONLINE
grac41
ora.ctssd
ONLINE
ONLINE
grac41 OBSERVER
ora.diskmon
OFFLINE
OFFLINE
ora.drivers.acfs
ONLINE
ONLINE
grac41
ora.evmd
ONLINE INTERMEDIATE
grac41
ora.gipcd
ONLINE ONLINE
grac41
ora.gpnpd
ONLINE
ONLINE
grac41
ora.mdnsd
ONLINE
ONLINE grac41
--> Event manager is in INTERMEDIATE state --> need
to reivew EVMD logfile first
Detailed Tracefile report
Grep
Syntax:
$
fn_egrep.sh "failed to
resolve|gipcretFail" | egrep 'TraceFile|2014-05-20 11:0'
Failed
case:
TraceFileName:
./grac41/crsd/crsd.log and ./grac41/evmd/evmd.log may report
2014-05-20
11:04:02.563: [GIPCXCPT][154781440] gipchaInternalResolve:
failed to resolve ret gipcretKeyNotFound (36),
host 'grac41', port
'ffac-854b-c525-6f9c', hctx 0x2ed3940 [0000000000000010] { gipchaContext :
host 'grac41',
name 'd541-9a1e-7807-8f4a', luid 'f733b93a-00000000', numNode 0, numInf 1,
usrFlags 0x0, flags 0x5 }, ret gipcretKeyNotFound (36)
2014-05-20
11:04:02.563: [GIPCHGEN][154781440] gipchaResolveF
[gipcmodGipcResolve : gipcmodGipc.c : 806]:
EXCEPTION[ ret gipcretKeyNotFound
(36) ] failed to resolve ctx
0x2ed3940 [0000000000000010] { gipchaContext :
host 'grac41', name
'd541-9a1e-7807-8f4a', luid 'f733b93a-00000000', numNode 0, numInf 1, usrFlags
0x0, flags 0x5 },
host 'grac41',
port 'ffac-854b-c525-6f9c', flags 0x0
--> Both request trying to use a host name.
If this isn't resolved
we very likely have a Name server problem !
TraceFileName: ./grac41/ohasd/ohasd.log reports
2014-05-20
11:03:21.364: [GIPCXCPT][2905085696]gipchaInternalReadGpnp:
No network info configured in GPNP,
using defaults, ret gipcretFail (1)
TraceFileName:
./evmd/evmd.log
2014-05-13
15:01:00.690: [ OCRMSG][2621794080]prom_waitconnect: CONN NOT ESTABLISHED
(0,29,1,2)
2014-05-13
15:01:00.690: [ OCRMSG][2621794080]GIPC error [29] msg [gipcretConnectionRefused]
2014-05-13
15:01:00.690: [ OCRMSG][2621794080]prom_connect: error while waiting for
connection complete [24]
2014-05-13
15:01:00.690: [ CRSOCR][2621794080] OCR context init failure.
Error: PROC-32: Cluster Ready Services on the local node
TraceFileName: ./grac41/gpnpd/gpnpd.log
2014-05-22
11:18:02.209: [ OCRCLI][1738393344]proac_con_init: Local listener using
IPC. [(ADDRESS=(PROTOCOL=ipc)(KEY=procr_local_conn_0_PROC))]
2014-05-22
11:18:02.209: [ OCRMSG][1738393344]prom_waitconnect: CONN NOT ESTABLISHED
(0,29,1,2)
2014-05-22
11:18:02.209: [ OCRMSG][1738393344]GIPC
error [29] msg [gipcretConnectionRefused]
2014-05-22
11:18:02.209: [ OCRMSG][1738393344]prom_connect: error while waiting for
connection complete [24]
2014-05-22
11:18:02.209: [ OCRCLI][1738393344]proac_con_init: Failed to connect to
server [24]
2014-05-22
11:18:02.209: [ OCRCLI][1738393344]proac_con_init: Post sema. Con count
[1]
[ OCRCLI][1738393344]ac_init:2: Could not initialize con structures.
proac_con_init failed with [32]
Debug problem using cluvfy
[grid@grac42
~]$ cluvfy comp nodereach -n grac41
Verifying
node reachability
Checking
node reachability...
PRVF-6006 : Unable to reach any of the nodes
PRKN-1034 : Failed to retrieve IP address of host "grac41"
==> Confirmation that we have a Name Server problem
Verification
of node reachability was unsuccessful on all the specified nodes.
[grid@grac42
~]$ cluvfy comp nodecon -n grac41
Verifying
node connectivity
ERROR:
PRVF-6006 : Unable to reach any of the nodes
PRKN-1034 : Failed to retrieve IP address of host "grac41"
Verification
cannot proceed
Verification
of node connectivity was unsuccessful on all the specified nodes.
Debug problem with OS comands like ping and nslookup
==> For futher details see GENERIC Networking troubleshooting
chapter
Case #2 : Private Interface down or wrong IP address- CSSD not
starting
Reported
Clusterware Error in CW alert.log:
[/u01/app/11204/grid/bin/cssdagent(16445)]
CRS-5818:Aborted command 'start' for resource 'ora.cssd'.
Reported
in ocssd.log
: [ CSSD][491194112]clssnmvDHBValidateNcopy:
node 1, grac41, has a disk HB, but no network HB,
Reported
in crfmond.log
: [ CRFM][4239771392]crfm_connect_to:
Wait failed with gipcret:
16 for conaddr tcp://192.168.2.103:61020
Testing
scenario :
Shutdown private interface
[root@grac42
~]# ifconfig eth2 down
[root@grac42
~]# ifconfig eth2
eth2
Link encap:Ethernet HWaddr 08:00:27:DF:79:B9
BROADCAST MULTICAST MTU:1500
Metric:1
RX packets:754556 errors:0 dropped:0 overruns:0 frame:0
TX packets:631900 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:378302114 (360.7 MiB) TX bytes:221328282 (211.0 MiB)
[root@grac42
~]# crsctl start crs
CRS-4123:
Oracle High Availability Services has been started.
Clusterware
status :
[root@grac42
grac42]# crsctl check crs
CRS-4638:
Oracle High Availability Services is online
CRS-4535:
Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster
Synchronization Services daemon
CRS-4534:
Cannot communicate with Event Manager
[grid@grac42
grac42]$ crsi
NAME
TARGET
STATE
SERVER STATE_DETAILS
-------------------------
---------- ---------- ------------
------------------
ora.asm
ONLINE
OFFLINE grac41
Instance Shutdown
ora.cluster_interconnect.haip
ONLINE
OFFLINE
ora.crf
ONLINE
ONLINE grac42
ora.crsd
ONLINE
OFFLINE
ora.cssd
ONLINE
OFFLINE STARTING
ora.cssdmonitor
ONLINE
ONLINE
grac42
ora.ctssd
ONLINE
OFFLINE
ora.diskmon
OFFLINE
OFFLINE
ora.drivers.acfs
ONLINE
OFFLINE
ora.evmd
ONLINE
OFFLINE
ora.gipcd
ONLINE
ONLINE
grac42
ora.gpnpd
ONLINE
ONLINE
grac42
ora.mdnsd
ONLINE
ONLINE grac42
--> CSSD in mode STARTING and not progressing over time
After some minutes
CSSD goes OFFLINE
Tracefile Details:
alertgrac42.log:
[cssd(16469)]CRS-1656:The CSS daemon is terminating due to a fatal error;
Details at (:CSSSC00012:) in .../cssd/ocssd.log
2014-05-31
14:15:41.828:
[cssd(16469)]CRS-1603:CSSD
on node grac42 shutdown by user.
2014-05-31
14:15:41.828:
[/u01/app/11204/grid/bin/cssdagent(16445)]CRS-5818:Aborted command 'start' for resource 'ora.cssd'.
Details at (:CRSAGF00113:) {0:0:2} in
../agent/ohasd/oracssdagent_root/oracssdagent_root.log.
ocssd.log:
2014-05-31
14:23:11.534: [ CSSD][491194112]clssnmvDHBValidateNcopy: node
1, grac41,
has a disk HB, but no network HB,
DHB has rcfg 296672934, wrtcnt,
25000048, LATS 9730774, lastSeqNo 25000045,
uniqueness 1401378465, timestamp
1401538986/54631634
2014-05-31
14:23:11.550: [ CSSD][481683200]clssnmvDHBValidateNcopy: node
1, grac41,
has a disk HB, but no network HB,
DHB has rcfg 296672934, wrtcnt,
25000050, LATS 9730794, lastSeqNo 25000047,
uniqueness 1401378465,
timestamp 1401538986/54632024
Using grep to locate errors
CRF
resource is checking CI every 5s and reports errors:
$
fn.sh 2014-06-03 | egrep 'TraceFileName|failed'
TraceFileName:
./crfmond/crfmond.log
2014-06-03
08:28:40.859: [ CRFM][4239771392]crfm_connect_to:
Wait
failed with gipcret: 16 for conaddr tcp://192.168.2.103:61020
2014-06-03
08:28:46.065: [ CRFM][4239771392]crfm_connect_to: Wait failed
with gipcret: 16 for conaddr tcp://192.168.2.103:61020
2014-06-03
08:28:51.271: [ CRFM][4239771392]crfm_connect_to: Wait failed
with gipcret: 16 for conaddr tcp://192.168.2.103:61020
2014-06-03
08:37:11.264: [ CSSCLNT][4243982080]clsssInitNative: connect to
(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_grac42_)) failed, rc 13
DHB has rcfg 296672934, wrtcnt, 23890957, LATS 9730794, lastSeqNo 23890939,
uniqueness 1401292202, timestamp 1401538981/89600384
Do we have any network problems ?
$
fn.sh "2014-06-03" | egrep 'but no
network HB|TraceFileName'
Search
String: no network HB
TraceFileName:
./cssd/ocssd.log
2014-06-02
12:51:52.564: [ CSSD][2682775296]clssnmvDHBValidateNcopy:
node 3, grac43,
has a disk HB, but no network HB, DHB has rcfg 297162159, wrtcnt,
24036295, LATS 167224,
lastSeqNo 24036293,
uniqueness 1401692525, timestamp 1401706311/13179394
2014-06-02
12:51:53.569: [ CSSD][2682775296]clssnmvDHBValidateNcopy:
node 1, grac41,
has a disk HB, but no network HB, DHB has rcfg 297162159,
wrtcnt, 25145340, LATS 168224,
lastSeqNo 25145334,
uniqueness 1401692481, timestamp 1401706313/13192074
Is there a valid CI network device ?
#
fn_egrep.sh "NETDATA" | egrep 'TraceFile|2014-06-03'
TraceFileName:
./gipcd/gipcd.log
2014-06-03
07:48:40.372: [ CLSINET][3977414400] Returning NETDATA: 1 interfaces
2014-06-03
07:48:45.401: [ CLSINET][3977414400] Returning NETDATA:
1 interfaces _-> ok
2014-06-03
07:52:51.589: [ CLSINET][1140848384] Returning
NETDATA: 0 interfaces --> Problem with CI
2014-06-03
07:52:51.669: [ CLSINET][1492440832] Returning NETDATA:
0 interfaces
Debug with cluvfy
[grid@grac41
~]$ cluvfy comp nodecon -n
grac41,grac42 -i eth1,eth2
Verifying
node connectivity
Checking
node connectivity...
Checking
hosts config file...
Verification
of the hosts config file successful
ERROR:
PRVG-11049 : Interface "eth2" does not exist on nodes
"grac42"
Check:
Node connectivity for interface "eth1"
Node
connectivity passed for interface "eth1"
TCP
connectivity check passed for subnet "192.168.1.0"
Check:
Node connectivity for interface "eth2"
Node
connectivity failed for interface "eth2"
TCP
connectivity check passed for subnet "192.168.2.0"
Checking
subnet mask consistency...
Subnet
mask consistency check passed for subnet "10.0.2.0".
Subnet
mask consistency check passed for subnet "192.168.1.0".
Subnet
mask consistency check passed for subnet "192.168.2.0".
Subnet
mask consistency check passed for subnet "169.254.0.0".
Subnet
mask consistency check passed for subnet "192.168.122.0".
Subnet
mask consistency check passed.
Node
connectivity check failed
Verification
of node connectivity was unsuccessful on all the specified nodes.
Debug using OS commands
[grid@grac42
NET]$ /bin/ping -s 1500 -c 2 -I 192.168.2.102 192.168.2.101
bind: Cannot assign requested address
[grid@grac42
NET]$ /bin/ping -s 1500 -c 2 -I 192.168.2.102 192.168.2.102
bind: Cannot assign requested address
Verify GnpP profile and find out CI device
[root@grac42
crfmond]# $GRID_HOME/bin/gpnptool get
2>/dev/null | xmllint --format - | egrep
'CSS-Profile|ASM-Profile|Network id'
<gpnp:HostNetwork id="gen" HostName="*">
<gpnp:Network id="net1" IP="192.168.1.0"
Adapter="eth1" Use="public"/>
<gpnp:Network id="net2"
IP="192.168.2.0" Adapter="eth2"
Use="cluster_interconnect"/>
-->
eth2 is our cluster interconnect
[root@grac42
crfmond]# ifconfig eth2
eth2
Link encap:Ethernet HWaddr 08:00:27:DF:79:B9
inet addr:192.168.3.102
Bcast:192.168.3.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fedf:79b9/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
-> Here we have a wrong address 192.168.3.102 should be
192.168.2.102
Note CI device should have following flags : UP
BROADCAST RUNNING MULTICAST
Case #3 : Public Interface down - Public network
ora.net1.network not starting
Reported
in ./agent/crsd/orarootagent_root/orarootagent_root.log
2014-05-31 11:58:27.899: [ AGFW][1829066496]{2:12808:5}
Preparing START command for: ora.net1.network grac42 1
2014-05-31 11:58:27.969: [ AGFW][1829066496]{2:12808:5} ora.net1.network grac42 1 state changed from: STARTING to:
UNKNOWN
2014-05-31 11:58:27.969: [ AGFW][1829066496]{2:12808:5}
ora.net1.network grac42 1 would be continued to monitored!
Reported Clusterware Error in CW alert.log: no errors
reported
Testing
scenario :
-
Shutdown public interface
[root@grac42
evmd]# ifconfig eth1 down
[root@grac42
evmd]# ifconfig eth1
eth1
Link encap:Ethernet HWaddr 08:00:27:63:08:07
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:2889 errors:0 dropped:0 overruns:0 frame:0
TX packets:2458 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:484535 (473.1 KiB) TX bytes:316359 (308.9 KiB)
[root@grac42
~]# crsctl start crs
CRS-4123:
Oracle High Availability Services has been started.
Clusterware
status :
[grid@grac42
grac42]$ crsctl check crs
CRS-4638:
Oracle High Availability Services is online
CRS-4537:
Cluster Ready Services is online
CRS-4529:
Cluster Synchronization Services is online
CRS-4533:
Event Manager is online
--> OHASD stack is ok !
[grid@grac42
grac42]$ crsi
NAME
TARGET
STATE
SERVER STATE_DETAILS
-------------------------
---------- ---------- ------------ ------------------
ora.asm
ONLINE
ONLINE grac42
Started
ora.cluster_interconnect.haip
ONLINE
ONLINE
grac42
ora.crf
ONLINE
ONLINE grac42
ora.crsd
ONLINE
ONLINE
grac42
ora.cssd
ONLINE
ONLINE grac42
ora.cssdmonitor
ONLINE
ONLINE
grac42
ora.ctssd
ONLINE ONLINE
grac42 OBSERVER
ora.diskmon
OFFLINE
OFFLINE
ora.drivers.acfs
ONLINE
ONLINE
grac42
ora.evmd
ONLINE
ONLINE grac42
ora.gipcd
ONLINE
ONLINE
grac42
ora.gpnpd
ONLINE
ONLINE
grac42
ora.mdnsd
ONLINE
ONLINE
grac42
--> OAHSD stack is up and running
[grid@grac42
grac42]$ crs | grep grac42
NAME
TARGET
STATE
SERVER STATE_DETAILS
-------------------------
---------- ---------- ------------
------------------
ora.ASMLIB_DG.dg
ONLINE ONLINE
grac42
ora.DATA.dg
ONLINE
ONLINE
grac42
ora.FRA.dg
ONLINE
ONLINE
grac42
ora.LISTENER.lsnr
ONLINE
OFFLINE
grac42
ora.OCR.dg
ONLINE
ONLINE
grac42
ora.SSD.dg
ONLINE
ONLINE
grac42
ora.asm
ONLINE
ONLINE grac42
Started
ora.gsd
OFFLINE
OFFLINE
grac42
ora.net1.network
ONLINE OFFLINE
grac42
ora.ons
ONLINE
OFFLINE grac42
ora.registry.acfs
ONLINE
ONLINE
grac42
ora.grac4.db
ONLINE
OFFLINE
grac42 Instance Shutdown
ora.grac4.grac42.svc
ONLINE
OFFLINE
grac42
ora.grac42.vip
ONLINE INTERMEDIATE
grac43 FAILED OVER
--> ora.net1.network OFFLINE
ora.grac42.vip n status INTERMEDIATE -
FAILED OVER to grac43
Check messages logged for resource ora.net1.network
from 2014-05-31 11:00:00 - 2014-05-31
11:59_59
[grid@grac42
grac42]$ fn.sh "ora.net1.network" |
egrep '2014-05-31 11|TraceFileName'
TraceFileName:
./agent/crsd/orarootagent_root/orarootagent_root.log
2014-05-31
11:58:27.899: [ AGFW][1829066496]{2:12808:5} Agent received
the message: RESOURCE_START[ora.net1.network grac42 1] ID 4098:403
2014-05-31
11:58:27.899: [ AGFW][1829066496]{2:12808:5} Preparing START
command for: ora.net1.network grac42 1
2014-05-31
11:58:27.899: [ AGFW][1829066496]{2:12808:5} ora.net1.network
grac42 1 state changed from: OFFLINE to:
STARTING
2014-05-31
11:58:27.917: [ AGFW][1826965248]{2:12808:5} Command: start
for resource: ora.net1.network grac42 1 completed with status: SUCCESS
2014-05-31
11:58:27.919: [ AGFW][1829066496]{2:12808:5} Agent sending
reply for: RESOURCE_START[ora.net1.network grac42 1] ID 4098:403
2014-05-31
11:58:27.969: [ AGFW][1829066496]{2:12808:5} ora.net1.network
grac42 1 state changed from: STARTING to: UNKNOWN
2014-05-31
11:58:27.969: [ AGFW][1829066496]{2:12808:5} ora.net1.network
grac42 1 would be continued to monitored!
2014-05-31
11:58:27.969: [ AGFW][1829066496]{2:12808:5} Started implicit
monitor for [ora.net1.network grac42 1] interval=1000 delay=1000
2014-05-31
11:58:27.969: [ AGFW][1829066496]{2:12808:5} Agent sending
last reply for: RESOURCE_START[ora.net1.network grac42 1] ID 4098:403
2014-05-31
11:58:27.982: [ AGFW][1829066496]{2:12808:5} Agent received
the message: RESOURCE_CLEAN[ora.net1.network grac42 1] ID 4100:409
2014-05-31
11:58:27.982: [ AGFW][1829066496]{2:12808:5} Preparing CLEAN
command for: ora.net1.network grac42 1
2014-05-31
11:58:27.982: [ AGFW][1829066496]{2:12808:5} ora.net1.network
grac42 1 state changed from: UNKNOWN to: CLEANING
2014-05-31
11:58:27.983: [ AGFW][1826965248]{2:12808:5} Command: clean
for resource: ora.net1.network grac42 1 completed with status: SUCCESS
2014-05-31
11:58:27.984: [ AGFW][1829066496]{2:12808:5} Agent sending
reply for: RESOURCE_CLEAN[ora.net1.network grac42 1] ID 4100:409
2014-05-31
11:58:27.984: [ AGFW][1829066496]{2:12808:5} ora.net1.network
grac42 1 state changed from: CLEANING to: OFFLINE
Debug with cluvfy
Run
cluvfy on the failing node
[grid@grac42
grac42]$ cluvfy comp nodereach -n grac42
Verifying
node reachability
Checking
node reachability...
PRVF-6006 : Unable to reach any of the nodes
PRKC-1071 : Nodes "grac42" did not respond to ping in
"3" seconds,
PRKN-1035 : Host "grac42" is unreachable
Verification
of node reachability was unsuccessful on all the specified nodes.
Debug
with OS Commands
[grid@grac42
NET]$ /bin/ping -s 1500 -c 2 -I 192.168.1.102 grac42
bind: Cannot assign requested address
--> Here we are failing as eht1 is not up and running
[grid@grac42
NET]$ /bin/ping -s 1500 -c 2 -I 192.168.1.102 grac41
ping: unknown host grac41
--> Here we are failing due as Nameserver can be reached
- resource STATE remains STARTING for a long time
- resource process gets restarted quickly but could not successful start at all
- Note strace will only help for protection or connection issues.
- If there is a logical corruption you need to review CW log files
Command
used as user root : # strace -t -f -o
/tmp/ohasd.trc crsctl start crs
CW status : GPNP daemon doesn't come up
Verify
OHASD stack
NAME
TARGET
STATE
SERVER STATE_DETAILS
-------------------------
---------- ---------- ------------ ------------------
ora.asm
ONLINE
OFFLINE
Instance Shutdown
ora.cluster_interconnect.haip
ONLINE OFFLINE
ora.crf
ONLINE OFFLINE
ora.crsd
ONLINE OFFLINE
ora.cssd
ONLINE OFFLINE
ora.cssdmonitor
ONLINE OFFLINE
ora.ctssd
ONLINE OFFLINE
ora.diskmon
ONLINE OFFLINE
ora.drivers.acfs
ONLINE ONLINE
grac41
ora.evmd
ONLINE OFFLINE
ora.gipcd
ONLINE OFFLINE
ora.gpnpd
ONLINE
OFFLINE STARTING
ora.mdnsd
ONLINE
ONLINE grac41
--> gpnpd daemon does not progress over time : STATE shows
STARTING
Now stop CW and restart CW startup with strace -t -f
[root@grac41
gpnpd]# strace -t -f -o /tmp/ohasd.trc crsctl
start crs
[root@grac41
gpnpd]# grep -i gpnpd /tmp/ohasd.trc | more
Check
whether gpnpd shell and gpnpd.bin were scheduled for running :
root@grac41
log]# grep -i execve /tmp/ohasd.trc | grep gpnp
9866
08:13:56 execve("/u01/app/11204/grid/bin/gpnpd",
["/u01/app/11204/grid/bin/gpnpd"], [/* 72 vars */] <unfinished
...>
9866
08:13:56
execve("/u01/app/11204/grid/bin/gpnpd.bin",
["/u01/app/11204/grid/bin/gpnpd.bi"...], [/* 72 vars */]
<unfinished ...>
11017
08:16:01 execve("/u01/app/11204/grid/bin/gpnpd",
["/u01/app/11204/grid/bin/gpnpd"], [/* 72 vars */] <unfinished
...>
11017
08:16:01 execve("/u01/app/11204/grid/bin/gpnpd.bin",
["/u01/app/11204/grid/bin/gpnpd.bi"...], [/* 72 vars */]
<unfinished ...>
--> gpnpd.bin was scheduled 2x in 5 seconeds - seems we have
problem here check return codes
Check ohasd.trc for errors like:
$
egrep
'EACCES|ENOENT|EADDRINUSE|ECONNREFUSED|EPERM' /tmp/ohasd.trc
Check
ohasd.trc for certain return codes
#
grep EACCES /tmp/ohasd.trc
#
grep ENOENT ohasd.trc # returns a lot of info
Linux error codes leading to a failed CW startup
EACCES :
open("/u01/app/11204/grid/log/grac41/gpnpd/gpnpdOUT.log",
O_RDWR|O_CREAT|O_APPEND, 0644) = -1 EACCES
(Permission denied)
ENOENT :
stat("/u01/app/11204/grid/log/grac41/ohasd", 0x7fff17d68f40) = -1 ENOENT (No such file or directory)
EADDRINUSE :
[pid
7391] bind(6, {sa_family=AF_FILE,
path="/var/tmp/.oracle/sgrac41DBG_MDNSD"}, 110) = -1 EADDRINUSE (Address already in use)
ECONNREFUSED :
[pid
7391] connect(6, {sa_family=AF_FILE,
path="/var/tmp/.oracle/sgrac41DBG_MDNSD"}, 110) = -1 ECONNREFUSED (Connection refused)
EPERM :
[pid
7391] unlink("/var/tmp/.oracle/sgrac41DBG_MDNSD") = -1 EPERM (Operation not permitted)
Check for LOGfile and PID file usage
PID
file usage :
# grep "\.pid" ohasd.trc
Sucessful open of mdns/init/grac41.pid through MDSND
9848
08:13:55 stat("/u01/app/11204/grid/mdns/init/grac41.pid",
<unfinished ...>
9848
08:13:55 stat("/u01/app/11204/grid/mdns/init/grac41.pid",
<unfinished ...>
9848
08:13:55 access("/u01/app/11204/grid/mdns/init/grac41.pid", F_OK
<unfinished ...>
9848
08:13:55 statfs("/u01/app/11204/grid/mdns/init/grac41.pid",
<unfinished ...>
9848
08:13:55 open("/u01/app/11204/grid/mdns/init/grac41.pid",
O_WRONLY|O_CREAT|O_TRUNC, 0666) = 7
9841
08:13:56 stat("/u01/app/11204/grid/mdns/init/grac41.pid",
{st_mode=S_IFREG|0644, st_size=5, ...}) = 0
9841
08:13:56 stat("/u01/app/11204/grid/mdns/init/grac41.pid",
{st_mode=S_IFREG|0644, st_size=5, ...}) = 0
9841
08:13:56 access("/u01/app/11204/grid/mdns/init/grac41.pid",
F_OK) = 0
9841
08:13:56 statfs("/u01/app/11204/grid/mdns/init/grac41.pid",
{f_type="EXT2_SUPER_MAGIC", f_bsize=4096,
f_blocks=9900906,
f_bfree=2564283, f_bavail=2061346, f_files=2514944, f_ffree=2079685,
f_fsid={1657171729, 223082106}, f_namelen=255, f_frsize=4096}) = 0
9841
08:13:56 open("/u01/app/11204/grid/mdns/init/grac41.pid", O_RDONLY) =
27
9845
08:13:56 open("/var/tmp/.oracle/mdnsd.pid", O_WRONLY|O_CREAT|O_TRUNC,
0666 <unfinished ...>
Failed open of gpnp/init/grac41.pid through GPNPD
9842
08:16:00 stat("/u01/app/11204/grid/gpnp/init/grac41.pid",
<unfinished ...>
9842
08:16:00 stat("/u01/app/11204/grid/gpnp/init/grac41.pid",
<unfinished ...>
9842
08:16:00 access("/u01/app/11204/grid/gpnp/init/grac41.pid", F_OK
<unfinished ...>
9842
08:16:00 statfs("/u01/app/11204/grid/gpnp/init/grac41.pid",
<unfinished ...>
9842
08:16:00 open("/u01/app/11204/grid/gpnp/init/grac41.pid", O_RDONLY
<unfinished ...>
9860
08:16:01 stat("/u01/app/11204/grid/gpnp/init/grac41.pid",
<unfinished ...>
9860
08:16:01 stat("/u01/app/11204/grid/gpnp/init/grac41.pid",
<unfinished ...>
9860
08:16:01 access("/u01/app/11204/grid/gpnp/init/grac41.pid", F_OK
<unfinished ...>
9860
08:16:01 statfs("/u01/app/11204/grid/gpnp/init/grac41.pid",
<unfinished ...>
9860
08:16:01 open("/u01/app/11204/grid/gpnp/init/grac41.pid",
O_RDONLY <unfinished ...>
Sucessful open of Logfile log/grac41/mdnsd/mdnsd.log
through MDSND
#
grep "\.log" /tmp/ohasd.trc # this on is very
helplful
9845
08:13:55 open("/u01/app/11204/grid/log/grac41/mdnsd/mdnsd.log",
O_WRONLY|O_APPEND) = 4
9845
08:13:55 stat("/u01/app/11204/grid/log/grac41/mdnsd/mdnsd.log",
{st_mode=S_IFREG|0644, st_size=509983, ...}) = 0
9845
08:13:55 chmod("/u01/app/11204/grid/log/grac41/mdnsd/mdnsd.log",
0644) = 0
9845
08:13:55 stat("/u01/app/11204/grid/log/grac41/mdnsd/mdnsd.log",
{st_mode=S_IFREG|0644, st_size=509983, ...}) = 0
9845
08:13:55 chmod("/u01/app/11204/grid/log/grac41/mdnsd/mdnsd.log",
0644) = 0
9845
08:13:55 stat("/u01/app/11204/grid/log/grac41/mdnsd/mdnsd.log",
{st_mode=S_IFREG|0644, st_size=509983, ...}) = 0
9845
08:13:55 stat("/u01/app/11204/grid/log/grac41/mdnsd/mdnsd.log",
{st_mode=S_IFREG|0644, st_size=509983, ...}) = 0
9845
08:13:55 access("/u01/app/11204/grid/log/grac41/mdnsd/mdnsd.log", F_OK)
= 0
9845
08:13:55 statfs("/u01/app/11204/grid/log/grac41/mdnsd/mdnsd.log",
{f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=9900906,
f_bfree=2564312,
f_bavail=2061375, f_files=2514944, f_ffree=2079685, f_fsid={1657171729,
223082106}, f_namelen=255, f_frsize=4096}) = 0
9845
08:13:55 open("/u01/app/11204/grid/log/grac41/mdnsd/mdnsd.log",
O_WRONLY|O_APPEND) = 4
9845
08:13:55 stat("/u01/app/11204/grid/log/grac41/mdnsd/mdnsd.log",
{st_mode=S_IFREG|0644, st_size=509983, ...}) = 0
9845
08:13:55 stat("/u01/app/11204/grid/log/grac41/mdnsd/mdnsd.log",
<unfinished ...>
Failed open of Logfile log/grac41/gpnpd/gpnpdOUT.log
and through GPNPD
9866
08:13:56 open("/u01/app/11204/grid/log/grac41/gpnpd/gpnpdOUT.log", O_RDWR|O_CREAT|O_APPEND,
0644 <unfinished ...>
--> We need to get a
file descriptor back from open call
9842
08:15:56 stat("/u01/app/11204/grid/log/grac41/alertgrac41.log",
{st_mode=S_IFREG|0664, st_size=1877580, ...}) = 0
9842
08:15:56 stat("/u01/app/11204/grid/log/grac41/alertgrac41.log",
{st_mode=S_IFREG|0664, st_size=1877580, ...}) = 0
Checking IPC sockets usage
Succesfull opening of IPC sockets throuph of MSDMD process
# grep MDNSD /tmp/ohasd.trc
9849
08:13:56 chmod("/var/tmp/.oracle/sgrac41DBG_MDNSD", 0777) = 0
9862
08:13:56
access("/var/tmp/.oracle/sgrac41DBG_MDNSD", F_OK <unfinished
...>
9862
08:13:56 connect(28, {sa_family=AF_FILE,
path="/var/tmp/.oracle/sgrac41DBG_MDNSD"}, 110 <unfinished ...>
9849
08:13:56 <... getsockname resumed> {sa_family=AF_FILE, path="/var/tmp/.oracle/sgrac41DBG_MDNSD"},
[36]) = 0
9849
08:13:56 chmod("/var/tmp/.oracle/sgrac41DBG_MDNSD", 0777
<unfinished ...>
-->
connect was successfull at 13:56 - further processing with system calls like
bind() will be seend in trace
No furhter connect requests are happing for
/var/tmp/.oracle/sgrac41DBG_MDNSD
Complete
log including a successful connect
9862
08:13:56 connect(28, {sa_family=AF_FILE,
path="/var/tmp/.oracle/sgrac41DBG_MDNSD"}, 110 <unfinished ...>
9834
08:13:56 <... times resumed> {tms_utime=3, tms_stime=4, tms_cutime=0,
tms_cstime=0}) = 435715860
9862
08:13:56 <... connect resumed> ) = 0
9862
08:13:56 connect(28, {sa_family=AF_FILE,
path="/var/tmp/.oracle/sgrac41DBG_MDNSD"}, 110 <unfinished ...>
9834
08:13:56 <... times resumed> {tms_utime=3, tms_stime=4, tms_cutime=0,
tms_cstime=0}) = 435715860
9862
08:13:56 <... connect resumed> ) = 0
..
9849
08:13:55 bind(6, {sa_family=AF_FILE,
path="/var/tmp/.oracle/sgrac41DBG_MDNSD"}, 110) = 0
..
9849
08:13:55 listen(6, 500 <unfinished ...>
9731
08:13:55 nanosleep({0, 1000000}, <unfinished ...>
9849
08:13:55 <... listen resumed>
) = 0
--> After a successfull listen system call clients can
connect
To allow clients to connect we need as
succesful connect(), bind() and listen() system call !
Failed opening of IPC sockets throuph of GPNPD process
# grep MDNSD /tmp/ohasd.trc
9924
08:14:37 access("/var/tmp/.oracle/sgrac41DBG_GPNPD", F_OK) = 0
9924
08:14:37 connect(30, {sa_family=AF_FILE,
path="/var/tmp/.oracle/sgrac41DBG_GPNPD"}, 110)
= -1
ECONNREFUSED (Connection refused)
9924
08:14:37 access("/var/tmp/.oracle/sgrac41DBG_GPNPD", F_OK) = 0
9924
08:14:37 connect(30, {sa_family=AF_FILE,
path="/var/tmp/.oracle/sgrac41DBG_GPNPD"}, 110) =
= -1 ECONNREFUSED
(Connection refused)
--> The connect request was unsucssful and was repeated again
and again.
- resource STATE remains STARTING for a long time
- resource process gets restarted quickly but could not succesfully start at all
- Note strace will only help for protection or connection issues.
- If there is a logical corruption you need to review CW log files
[root@grac41
.oracle]# crsi
NAME
TARGET
STATE
SERVER STATE_DETAILS
-------------------------
---------- ---------- ------------
------------------
ora.asm
ONLINE
OFFLINE
Instance Shutdown
ora.cluster_interconnect.haip
ONLINE
OFFLINE
ora.crf
ONLINE
OFFLINE
ora.crsd
ONLINE
OFFLINE
ora.cssd
ONLINE OFFLINE
ora.cssdmonitor
ONLINE
OFFLINE
ora.ctssd
ONLINE
OFFLINE
ora.diskmon
ONLINE
OFFLINE
ora.drivers.acfs
ONLINE
OFFLINE
ora.evmd
ONLINE
OFFLINE
ora.gipcd
ONLINE
OFFLINE
ora.gpnpd
ONLINE OFFLINE
STARTING
ora.mdnsd
ONLINE
ONLINE grac41
Check whether the related process is running ?
[root@grac41
.oracle]# ps -elf | egrep
"PID|d.bin|ohas|oraagent|orarootagent|cssdagent|cssdmonitor" | grep
-v grep
F
S UID PID PPID C
PRI NI ADDR SZ WCHAN STIME
TTY TIME CMD
4
S root 7023 1
0 80 0 - 2846 pipe_w 06:01 ?
00:00:00 /bin/sh /etc/init.d/init.ohasd run
0
S grid 17764 10515 0 80 0
- 1113 wait 12:11 pts/8 00:00:00 strace -t
-f -o /tmp/mdnsd.trc /u01/app/11204/grid/bin/mdnsd
0
S grid 17767 17764 0 80 0 -
78862 poll_s 12:11 ? 00:00:00
/u01/app/11204/grid/bin/mdnsd.bin
4
S root 18501 1 23
80 0 - 176836 futex_ 12:13 ?
00:05:07 /u01/app/11204/grid/bin/ohasd.bin reboot
4
S grid 18567 1 1
80 0 - 170697 futex_ 12:13 ?
00:00:14 /u01/app/11204/grid/bin/oraagent.bin
4
S root 20306 1 0
-40 - - 160927 futex_ 12:19 ?
00:00:03 /u01/app/11204/grid/bin/cssdmonitor
4
S root 21396 1 0
80 0 - 163115 futex_ 12:23 ?
00:00:00 /u01/app/11204/grid/bin/orarootagent.bin
--> gpnpd.bin is not running - but was restarted very often
Starting
gpnpd.bin process with strace : $ strace
-t -f -o /tmp/gpnpd.trc /u01/app/11204/grid/bin/gpnpd
Note this is very very dangerous as you need to know which user
start this process
IF you run this procless as the wrong user OLR,OCR, IPC sockest
privs and log file location can we corrupted !
Before tracing a single process you may run strace -t -f -o
ohasd.trc crsctl start crs ( see the above chapater )
as this command alwas starts the process as correct user, in correct
order and pull up needed resources
Run the following commands only our your test system as a last ressort.
Manually start gpnpd as user grid with strace attached!
[grid@grac41
grac41]$ strace -t -f -o /tmp/gpnpd.trc
/u01/app/11204/grid/bin/gpnpd
Unable
to open file /u01/app/11204/grid/log/grac41/gpnpd/gpnpdOUT.log: %
[grid@grac41
grac41]$ egrep 'EACCES|ENOENT|EADDRINUSE|ECONNREFUSED|EPERM'
/tmp/gpnpd.trc
25251
12:37:39 open("/u01/app/11204/grid/log/grac41/gpnpd/gpnpdOUT.log",
O_RDWR|O_CREAT|O_APPEND, 0644) = -1 EACCES (Permission denied)
-==> Fix: # chown grid:oinstall
/u01/app/11204/grid/log/grac41/gpnpd/gpnpdOUT.log
Repeat the above command as long we errors
[grid@grac41
grac41]$ strace -t -f -o /tmp/gpnpd.trc /u01/app/11204/grid/bin/gpnpd
[grid@grac41
grac41]$ egrep 'EACCES|ENOENT|EADDRINUSE|ECONNREFUSED|EPERM'
/tmp/gpnpd.trc
27089
12:44:35 connect(45, {sa_family=AF_FILE,
path="/var/tmp/.oracle/sprocr_local_conn_0_PROC"}, 110) =
= -1
ENOENT (No such file or directory)
27089
12:44:58 connect(45, {sa_family=AF_FILE,
path="/var/tmp/.oracle/sprocr_local_conn_0_PROC"}, 110)
= -1
ENOENT (No such file or directory)
..
32486
13:03:52 connect(45, {sa_family=AF_FILE,
path="/var/tmp/.oracle/sprocr_local_conn_0_PROC"}, 110)
= -1
ENOENT (No such file or directory)
--> Connect was unsuccesfull - check IPC socket protections !
Note strace will only help for protection or connection issues.
If there is a logical corruption you need to review CW log files
Details of a logical OCR corruption with PROCL-5 error :
./gpnpd/gpnpd.log
..
[
CLWAL][3606726432]clsw_Initialize: OLR initlevel [70000]
[
OCRAPI][3606726432]a_init:10: AUTH LOC [/u01/app/11204/grid/srvm/auth]
[
OCRMSG][3606726432]prom_init: Successfully registered comp [OCRMSG] in clsd.
2014-05-20
07:57:51.118: [ OCRAPI][3606726432]a_init:11: Messaging init successful.
[
OCRCLI][3606726432]oac_init: Successfully registered comp [OCRCLI] in clsd.
2014-05-20
07:57:51.118: [ OCRCLI][3606726432]proac_con_init: Local listener using
IPC. [(ADDRESS=(PROTOCOL=ipc)(KEY=procr_local_conn_0_PROL))]
2014-05-20
07:57:51.119: [ OCRCLI][3606726432]proac_con_init: Successfully connected
to the server
2014-05-20
07:57:51.119: [ OCRCLI][3606726432]proac_con_init: Post sema. Con count
[1]
2014-05-20
07:57:51.120: [ OCRAPI][3606726432]a_init:12: Client init successful.
2014-05-20
07:57:51.120: [ OCRAPI][3606726432]a_init:21: OCR init successful. Init
Level [7]
2014-05-20
07:57:51.120: [ OCRAPI][3606726432]a_init:2: Init Level [7]
2014-05-20
07:57:51.132: [ OCRCLI][3606726432]proac_con_init: Post sema. Con count
[2]
[
clsdmt][3595089664]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=grac41DBG_GPNPD))
2014-05-20
07:57:51.133: [ clsdmt][3595089664]PID for the Process [31034], connkey
10
2014-05-20
07:57:51.133: [ clsdmt][3595089664]Creating PID [31034] file for home
/u01/app/11204/grid host grac41 bin gpnp to /u01/app/11204/grid/gpnp/init/
2014-05-20
07:57:51.133: [ clsdmt][3595089664]Writing PID [31034] to the file
[/u01/app/11204/grid/gpnp/init/grac41.pid]
2014-05-20
07:57:52.108: [ GPNP][3606726432]clsgpnpd_validateProfile:
[at clsgpnpd.c:2919] GPnPD taken cluster name 'grac4'
2014-05-20
07:57:52.108: [ GPNP][3606726432]clsgpnpd_openLocalProfile:
[at clsgpnpd.c:3477] Got local profile from file cache provider (LCP-FS).
2014-05-20
07:57:52.111: [ GPNP][3606726432]clsgpnpd_openLocalProfile:
[at clsgpnpd.c:3532] Got local profile from OLR cache provider (LCP-OLR).
2014-05-20
07:57:52.113: [ GPNP][3606726432]procr_open_key_ext: OLR api
procr_open_key_ext failed for key SYSTEM.GPnP.profiles.peer.pending
2014-05-20
07:57:52.113: [ GPNP][3606726432]procr_open_key_ext: OLR
current boot level : 7
2014-05-20
07:57:52.113: [ GPNP][3606726432]procr_open_key_ext: OLR
error code : 5
2014-05-20
07:57:52.126: [ GPNP][3606726432]procr_open_key_ext: OLR error
message : PROCL-5: User does not have permission to
perform a local registry operation on this key.
Authentication error [User does not have permission to perform this
operation] [0]
2014-05-20
07:57:52.126: [ GPNP][3606726432]clsgpnpco_ocr2profile: [at
clsgpnpco.c:578] Result: (58) CLSGPNP_OCR_ERR. Failed to open requested OLR
Profile.
2014-05-20
07:57:52.127: [ GPNP][3606726432]clsgpnpd_lOpen: [at
clsgpnpd.c:1734] Listening on ipc://GPNPD_grac41
2014-05-20
07:57:52.127: [ GPNP][3606726432]clsgpnpd_lOpen: [at
clsgpnpd.c:1743] GIPC gipcretFail (1) gipcListen
listen failure on
2014-05-20
07:57:52.127: [ default][3606726432]GPNPD failed to start listening for GPnP
peers.
2014-05-20
07:57:52.135: [ GPNP][3606726432]clsgpnpd_term: [at
clsgpnpd.c:1344] STOP GPnPD terminating. Closing
connections...
2014-05-20
07:57:52.137: [ default][3606726432]clsgpnpd_term STOP terminating.
2014-05-20
07:57:53.136: [ OCRAPI][3606726432]a_terminate:1:current ref count = 1
2014-05-20
07:57:53.136: [ OCRAPI][3606726432]a_terminate:1:current ref count = 0
--> Fatal OLR error ==> OLR is corrupted ==> GPnPD
terminating .
For
details how to fix PROCL-5 error please read following link.