CRS does not start GIPC error: [29] msg
[gipcretConnectionRefused]
- Check your disk space using: # df
- Check whether your are a firewall: # service iptables status ( <— this command is very important )
- Use Nslookup and ping to verify you Cluster Interconnect
Errors:
   GIPC repot error [29]
msg [gipcretConnectionRefused]
   CHM report
clsu_get_private_ip failed 
Check CRS status
[root@grac41
Desktop]#  crsctl check crs
CRS-4638:
Oracle High Availability Services is online
CRS-4535:
Cannot communicate with Cluster Ready Services
CRS-4529:
Cluster Synchronization Services is online
CRS-4534:
Cannot communicate with Event Manager
[root@grac41
network-scripts]# my_crs_stat_init
NAME                          
TARGET    
STATE          
SERVER       STATE_DETAILS   
-------------------------     
---------- ----------      ------------
------------------
ora.asm                       
ONLINE    
OFFLINE                     
Instance Shutdown
ora.cluster_interconnect.haip 
ONLINE    
OFFLINE                     
 
ora.crf                       
ONLINE    
ONLINE          grac41      
 
ora.crsd                      
ONLINE    
OFFLINE                     
 
ora.cssd                      
ONLINE    
UNKNOWN        
grac41        
ora.cssdmonitor               
ONLINE    
ONLINE         
grac41        
ora.ctssd                     
ONLINE    
OFFLINE                     
 
ora.diskmon                   
OFFLINE   
OFFLINE                     
 
ora.drivers.acfs              
ONLINE    
ONLINE         
grac41        
ora.evmd                      
ONLINE    
OFFLINE                     
 
ora.gipcd                     
ONLINE    
ONLINE         
grac41        
ora.gpnpd                     
ONLINE    
ONLINE         
grac41        
ora.mdnsd                     
ONLINE    
ONLINE          grac41 
--> ASM, HAIP, CRSD, CTSSD, DISKMON, EVMD resource are
OFFLINE  !
Check traces - ohasd trace file
[root@grac41
ohasd]#  cat ohasd.log | grep -i failed
2014-04-22
15:09:17.966: [    AGFW][2735122176]{0:0:2}
ora.cluster_interconnect.haip 1 1 received state from probe request. Old state
= UNKNOWN, New state = FAILED
2014-04-22
15:09:30.292: [    GPNP][2745628416]clsgpnp_getCachedProfileEx:
[at clsgpnp.c:623] Result: (26) CLSGPNP_NO_PROFILE. Failed to get offline GPnP
service profile. 
2014-04-22
15:09:30.602: [    GPNP][2717640448]clsgpnp_getCachedProfileEx:
[at clsgpnp.c:623] Result: (26) CLSGPNP_NO_PROFILE. Failed to get offline GPnP
service profile. 
--> HAIP goes to FAILED status
Try to find any repeating updated tracefiles
- maybe some RAC process tries to fix the network problem 
[grid@grac41
grac41]$ date;  find . -type f -printf
"%CY-%Cm-%Cd %CH:%CM:%CS  %h/%f\n" | sort -n | tail -5
Tue Apr 22 13:24:40 CEST 2014
2014-04-22
13:24:30.0571859790  ./gpnpd/gpnpd.log
2014-04-22
13:24:33.0756944610 
./agent/ohasd/oracssdmonitor_root/oracssdmonitor_root.log
2014-04-22
13:24:38.0881994320  ./ohasd/ohasd.log
2014-04-22
13:24:38.3523314350  ./gipcd/gipcd.log
2014-04-22
13:24:39.0876989250  ./crfmond/crfmond.log
[grid@grac41
grac41]$ date;  find . -type f -printf
"%CY-%Cm-%Cd %CH:%CM:%CS  %h/%f\n" | sort -n | tail -5
Tue Apr 22 13:24:43 CEST 2014
2014-04-22
13:24:30.0571859790  ./gpnpd/gpnpd.log
2014-04-22
13:24:33.0756944610 
./agent/ohasd/oracssdmonitor_root/oracssdmonitor_root.log
2014-04-22
13:24:43.1007044060  ./ohasd/ohasd.log
2014-04-22
13:24:43.3668374000  ./gipcd/gipcd.log
2014-04-22
13:24:43.7580328990  ./crfmond/crfmond.log
[grid@grac41
grac41]$ date;  find . -type f -printf
"%CY-%Cm-%Cd %CH:%CM:%CS  %h/%f\n" | sort -n | tail -5
Tue Apr 22 13:24:47 CEST 2014
2014-04-22
13:24:30.0571859790  ./gpnpd/gpnpd.log
2014-04-22
13:24:33.0756944610 
./agent/ohasd/oracssdmonitor_root/oracssdmonitor_root.log
2014-04-22
13:24:43.1007044060  ./ohasd/ohasd.log
2014-04-22
13:24:44.0972023860  ./crfmond/crfmond.log
2014-04-22
13:24:46.4033548850  ./gipcd/gipcd.log
-->
Here we cans see  that ./ohasd/ohasd.log  ./gipcd/gipcd.log
./crfmond/crfmond.log 
Use
tail to see what's going :
[grid@grac41
grac41]$ tail -f  ./gpnpd/gpnpd.log
2014-04-22
13:19:59.175: [  OCRMSG][4002494208]GIPC error
[29] msg [gipcretConnectionRefused]
2014-04-22
13:21:29.469: [  OCRMSG][4002494208]GIPC error [29] msg
[gipcretConnectionRefused]
2014-04-22
13:22:59.792: [  OCRMSG][4002494208]GIPC error [29] msg
[gipcretConnectionRefused]
2014-04-22
13:24:30.057: [  OCRMSG][4002494208]GIPC error [29] msg [gipcretConnectionRefused]
2014-04-22
13:26:00.383: [  OCRMSG][4002494208]GIPC error [29] msg
[gipcretConnectionRefused]
2014-04-22
13:27:30.622: [  OCRMSG][4002494208]GIPC error [29] msg
[gipcretConnectionRefused]
2014-04-22
13:29:00.869: [  OCRMSG][4002494208]GIPC error [29] msg
[gipcretConnectionRefused]
2014-04-22
13:30:31.203: [  OCRMSG][4002494208]GIPC error [29] msg
[gipcretConnectionRefused]
2014-04-22
13:32:01.459: [  OCRMSG][4002494208]GIPC error [29] msg
[gipcretConnectionRefused]
2014-04-22
13:33:31.770: [  OCRMSG][4002494208]GIPC error [29] msg
[gipcretConnectionRefused]
[grid@grac41
grac41]$  tail -f   
./ohasd/ohasd.log
2014-04-22
13:33:42.806: [GIPCHDEM][2222126848]gipchaDaemonInfRequest:
sent local interfaceRequest,  hctx 0x2d03370 [0000000000000010] {
gipchaContext : host 'grac41', name 'CLSFRAME_grac4', luid '57127705-00000000',
numNode 0, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd
2014-04-22
13:33:47.817: [GIPCHDEM][2222126848]gipchaDaemonInfRequest: sent local
interfaceRequest,  hctx 0x2d03370 [0000000000000010] { gipchaContext :
host 'grac41', name 'CLSFRAME_grac4', luid '57127705-00000000', numNode 0,
numInf 0, usrFlags 0x0, flags 0x63 } to gipcd
2014-04-22
13:33:52.839: [GIPCHDEM][2222126848]gipchaDaemonInfRequest: sent local
interfaceRequest,  hctx 0x2d03370 [0000000000000010] { gipchaContext :
host 'grac41', name 'CLSFRAME_grac4', luid '57127705-00000000', numNode 0,
numInf 0, usrFlags 0x0, flags 0x63 } to gipcd
2014-04-22
13:33:57.848: [GIPCHDEM][2222126848]gipchaDaemonInfRequest: sent local
interfaceRequest,  hctx 0x2d03370 [0000000000000010] { gipchaContext :
host 'grac41', name 'CLSFRAME_grac4', luid '57127705-00000000', numNode 0,
numInf 0, usrFlags 0x0, flags 0x63 } to gipcd
2014-04-22
13:34:03.859: [GIPCHDEM][2222126848]gipchaDaemonInfRequest: sent local
interfaceRequest,  hctx 0x2d03370 [0000000000000010] { gipchaContext :
host 'grac41', name 'CLSFRAME_grac4', luid '57127705-00000000', numNode 0,
numInf 0, usrFlags 0x0, flags 0x63 } to gipcd
2014-04-22
13:34:09.874: [GIPCHDEM][2222126848]gipchaDaemonInfRequest: sent local
interfaceRequest,  hctx 0x2d03370 [0000000000000010] { gipchaContext :
host 'grac41', name 'CLSFRAME_grac4', luid '57127705-00000000', numNode 0,
numInf 0, usrFlags 0x0, flags 0x63 } to gipcd
2014-04-22
13:34:15.881: [GIPCHDEM][2222126848]gipchaDaemonInfRequest: sent local
interfaceRequest,  hctx 0x2d03370 [0000000000000010] { gipchaContext :
host 'grac41', name 'CLSFRAME_grac4', luid '57127705-00000000', numNode 0,
numInf 0, usrFlags 0x0, flags 0x63 } to gipcd
2014-04-22
13:34:20.900: [GIPCHDEM][2222126848]gipchaDaemonInfRequest: sent local
interfaceRequest,  hctx 0x2d03370 [0000000000000010] { gipchaContext :
host 'grac41', name 'CLSFRAME_grac4', luid '57127705-00000000', numNode 0,
numInf 0, usrFlags 0x0, flags 0x63 } to gipcd
2014-04-22
13:34:25.920: [GIPCHDEM][2222126848]gipchaDaemonInfRequest: sent local
interfaceRequest,  hctx 0x2d03370 [0000000000000010] { gipchaContext :
host 'grac41', name 'CLSFRAME_grac4', luid '57127705-00000000', numNode 0,
numInf 0, usrFlags 0x0, flags 0x63 } to gipcd
2014-04-22
13:34:30.934: [GIPCHDEM][2222126848]gipchaDaemonInfRequest: sent local
interfaceRequest,  hctx 0x2d03370 [0000000000000010] { gipchaContext :
host 'grac41', name 'CLSFRAME_grac4', luid '57127705-00000000', numNode 0,
numInf 0, usrFlags 0x0, flags 0x63 } to gipcd
[grid@grac41
grac41]$ tail -f  
./crfmond/crfmond.log
[  
CLWAL][467654400]clsw_Initialize: OLR initlevel [70000]
2014-04-22
13:34:49.349: [   
CRFM][467654400]crfm_connstr: clsu_get_private_ip failed(7).
2014-04-22
13:34:49.458: [   
CRFM][467654400]crfm_connect_to: send fail(gipcret: 13)
2014-04-22
13:34:49.458: [    CRFM][467654400]crfmctx dump follows
2014-04-22
13:34:49.458: [    CRFM][467654400]****************************
2014-04-22
13:34:49.458: [    CRFM][467654400]crfm_dumpctx: connection
local name: tcp://0.0.0.0:45871
2014-04-22
13:34:49.458: [    CRFM][467654400]crfm_dumpctx: connection peer
name:  tcp://192.168.1.101:61021
2014-04-22
13:34:49.458: [    CRFM][467654400]crfm_dumpctx: connaddr: 
tcp://grac41:61021
2014-04-22
13:34:49.458: [    CRFM][467654400]crfm_dumpctx: ctype:  2
2014-04-22
13:34:49.458: [    CRFM][467654400]crfm_dumpctx: mytype:  0
2014-04-22
13:34:49.458: [    CRFM][467654400]crfm_dumpctx: hostname 
grac41
2014-04-22
13:34:49.458: [    CRFM][467654400]crfm_dumpctx: myport:  
2014-04-22
13:34:49.458: [    CRFM][467654400]crfm_dumpctx: rhostname
 
2014-04-22
13:34:49.458: [    CRFM][467654400]crfm_dumpctx: rport:  
2014-04-22
13:34:49.458: [    CRFM][467654400]crfm_dumpctx: flags:  1
2014-04-22
13:34:49.458: [    CRFM][467654400]****************************
According to above traces we can see that clsu_get_private_ip
failed  getting private IP tcp://192.168.1.101
Check Network status and DNS
[root@grac41
Desktop]# ifconfig
eth1     
Link encap:Ethernet  HWaddr 08:00:27:89:E9:A2  
         
inet addr:192.168.2.101  Bcast:192.168.2.255  Mask:255.255.255.0
         
inet6 addr: fe80::a00:27ff:fe89:e9a2/64 Scope:Link
         
UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
         
RX packets:17148 errors:0 dropped:0 overruns:0
frame:0
         
TX packets:13307 errors:0 dropped:0 overruns:0
carrier:0
         
collisions:0 txqueuelen:1000 
         
RX bytes:22041591 (21.0 MiB)  TX bytes:1211055 (1.1 MiB)
         
Interrupt:9 Base address:0xd240 
eth2     
Link encap:Ethernet  HWaddr 08:00:27:6B:E2:BD  
         
inet addr:192.168.1.101  Bcast:192.168.1.255  Mask:255.255.255.0
         
inet6 addr: fe80::a00:27ff:fe6b:e2bd/64 Scope:Link
         
UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
         
RX packets:17517 errors:0 dropped:0 overruns:0
frame:0
         
TX packets:13475 errors:0 dropped:0 overruns:0
carrier:0
         
collisions:0 txqueuelen:1000 
         
RX bytes:22191772 (21.1 MiB)  TX bytes:1230703 (1.1 MiB)
         
Interrupt:5 Base address:0xd260 
-->
Check public and private interface for errors / Looks good 
[root@grac41
Desktop]# nslookup grac41
Name:  
 grac41.example.com
Address:
192.168.1.101
[root@grac41
Desktop]# nslookup grac41int
Name:  
 grac41int.example.com
Address:
192.168.2.101
[root@grac41
Desktop]# nslookup 192.168.1.101
101.1.168.192.in-addr.arpa  
 name = grac41.example.com.
[root@grac41
Desktop]# nslookup  192.168.2.101
101.2.168.192.in-addr.arpa  
 name = grac41int.example.com.
--> DNS and Network seems to be ok
Restart CRS
root@grac41
Desktop]# crsctl stop crs -f
CRS-2791:
Starting shutdown of Oracle High Availability Services-managed resources on
'grac41'
CRS-2673:
Attempting to stop 'ora.crf' on 'grac41'
CRS-2673:
Attempting to stop 'ora.ctssd' on 'grac41'
CRS-2673:
Attempting to stop 'ora.evmd' on 'grac41'
...
CRS-2673:
Attempting to stop 'ora.gpnpd' on 'grac41'
CRS-2677:
Stop of 'ora.gpnpd' on 'grac41' succeeded
CRS-2793:
Shutdown of Oracle High Availability Services-managed resources on 'grac41' has
completed
CRS-4133:
Oracle High Availability Services has been stopped.
Cleanup
/var/tmp/.oracle
#
rm  /var/tmp/.oracle/*
[root@grac41
Desktop]# crsctl start crs
[root@grac41
Desktop]# crsctl check crs
CRS-4638:
Oracle High Availability Services is online
CRS-4535:
Cannot communicate with Cluster Ready Services
CRS-4529:
Cluster Synchronization Services is online
CRS-4534:
Cannot communicate with Event Manager
--> Problem persists
Check OS logfile
# 
cat /var/log/messages
--> Nothing related
Run orcheck ( and orcdump ) to check whether we can access our
OCR repostory
[root@grac41
Desktop]#  ocrcheck
Status
of Oracle Cluster Registry is as follows :
  
 
Version                 
:          3
  
  Total space (kbytes)     :    
262120
  
  Used space (kbytes)     
:       4076
  
  Available space (kbytes) :     258044
  
 
ID                      
:  630679368
  
  Device/File Name        
:       +OCR
                                   
Device/File integrity check succeeded
                                   
Device/File not configured
                                   
Device/File not configured
                                   
Device/File not configured
                                   
Device/File not configured
  
  Cluster registry integrity check succeeded
  
  Logical corruption check succeeded
Query voting disk :
[grid@grac41
grac41]$ crsctl query css votedisk
## 
STATE    File Universal
Id               
File Name Disk group
-- 
-----   
-----------------               
--------- ---------
 1.
ONLINE   b0e94e5d83054fe9bf58b6b98bfacd65 (/dev/asmdisk1_udev_sdf1)
[OCR]
 2.
ONLINE   88c2a08b4c8c4f85bf0109e0990388e4 (/dev/asmdisk1_udev_sdg1)
[OCR]
 3.
ONLINE   1108f9a41e814fb2bfed879ff0039dd0 (/dev/asmdisk1_udev_sdh1)
[OCR]
Located
3 voting disk(s).
Debugging GIPCD and GPnPD daemons using strace 
As GIPCD and GPnPD daemon traces gets updated every 5s lets
check the gipcd process with strace
#
ps -elf | egrep 'gpnpd.bin|gipcd.bin'
# strace -t -f  -p 24376   2>&1  |
grep '192.168' | grep eth
[pid
24872] 09:17:28 <... ioctl resumed> 200, {{"lo", {AF_INET,
inet_addr("127.0.0.1")}}, {"eth0", {AF_INET,
inet_addr("10.0.2.15")}}, {"eth1", {AF_INET,
inet_addr("192.168.2.101")}}, {"eth2", {AF_INET,
inet_addr("192.168.1.101")}}, {"virbr0", {AF_INET,
inet_addr("192.168.122.1")}}}}) = 0
[pid
24870] 09:17:28 <... ioctl resumed> , {ifr_name="eth1",
ifr_addr={AF_INET, inet_addr("192.168.2.101")}}) = 0
[pid
24870] 09:17:28 <... ioctl resumed> , {ifr_name="eth1",
ifr_broadaddr={AF_INET, inet_addr("192.168.2.255")}}) = 0
[pid
24872] 09:17:28 <... ioctl resumed> , {ifr_name="eth1",
ifr_addr={AF_INET, inet_addr("192.168.2.101")}}) = 0
[pid
24870] 09:17:28 <... ioctl resumed> , {ifr_name="eth2",
ifr_addr={AF_INET, inet_addr("192.168.1.101")}}) = 0
[pid
24872] 09:17:28 <... ioctl resumed> , {ifr_name="eth1",
ifr_broadaddr={AF_INET, inet_addr("192.168.2.255")}}) = 0
[pid
24870] 09:17:28 <... ioctl resumed> , {ifr_name="eth2",
ifr_broadaddr={AF_INET, inet_addr("192.168.1.255")}}) = 0
[pid
24872] 09:17:28 <... ioctl resumed> , {ifr_name="eth2",
ifr_addr={AF_INET, inet_addr("192.168.1.101")}}) = 0
[pid
24872] 09:17:28 <... ioctl resumed> , {ifr_name="eth2",
ifr_broadaddr={AF_INET, inet_addr("192.168.1.255")}}) = 0
..
[pid
24872] 09:17:33 <... ioctl resumed> 200, {{"lo", {AF_INET,
inet_addr("127.0.0.1")}}, {"eth0", {AF_INET,
inet_addr("10.0.2.15")}}, {"eth1", {AF_INET,
inet_addr("192.168.2.101")}}, {"eth2", {AF_INET,
inet_addr("192.168.1.101")}}, {"virbr0", {AF_INET,
inet_addr("192.168.122.1")}}}}) = 0
[pid
24870] 09:17:33 <... ioctl resumed> , {ifr_name="eth1",
ifr_addr={AF_INET, inet_addr("192.168.2.101")}}) = 0
[pid
24870] 09:17:33 <... ioctl resumed> , {ifr_name="eth1",
ifr_broadaddr={AF_INET, inet_addr("192.168.2.255")}}) = 0
[pid
24870] 09:17:33 <... ioctl resumed> , {ifr_name="eth2",
ifr_addr={AF_INET, inet_addr("192.168.1.101")}}) = 0
[pid
24870] 09:17:33 <... ioctl resumed> , {ifr_name="eth2",
ifr_broadaddr={AF_INET, inet_addr("192.168.1.255")}}) = 0
[pid
24872] 09:17:33 <... ioctl resumed> , {ifr_name="eth1",
ifr_addr={AF_INET, inet_addr("192.168.2.101")}}) = 0
[pid
24872] 09:17:33 <... ioctl resumed> , {ifr_name="eth1",
ifr_broadaddr={AF_INET, inet_addr("192.168.2.255")}}) = 0
[pid
24872] 09:17:33 <... ioctl resumed> , {ifr_name="eth2",
ifr_addr={AF_INET, inet_addr("192.168.1.101")}}) = 0
..
--> Again we don't get an OS error but we are looping running
the same ioctl() command
    Seems the kernel is not happy with the
inforamtion we get from ioctl() call  and tries to reread the information
every 5 seconds 
Check GPnP profile
[root@grac41
Desktop]#  gpnptool get > profile.xml
Edit 
profile.xml and extract the adapter usage 
<gpnp:Network-Profile><gpnp:HostNetwork
id="gen" HostName="*">
  
<gpnp:Network id="net1"
IP="192.168.1.0" Adapter="eth1" Use="public"/>
   <gpnp:Network id="net2"
IP="192.168.2.0" Adapter="eth2"
Use="cluster_interconnect"/>
Verify
with ifconfig
[root@grac41
Desktop]# ifconfig | egrep 'HWaddr|inet addr'
eth1      Link
encap:Ethernet  HWaddr 08:00:27:89:E9:A2  
         
inet addr:192.168.2.101 
Bcast:192.168.2.255  Mask:255.255.255.0
eth2     
Link encap:Ethernet  HWaddr 08:00:27:6B:E2:BD  
         
inet addr:192.168.1.101  Bcast:192.168.1.255 
Mask:255.255.255.0
         
inet addr:127.0.0.1  Mask:255.0.0.0
-->
eth1 is using  192.168.2.101 but according GPnP Profile it should use
192.168.1.101
   
eth2 is using  192.168.1.101 but according GPnP Profile it should use
192.168.2.101
Problem found :
During manuall editing  ifcfg-eth1 and ifcfg-eth2  HWADR
entry was wrongly filled ( /etc/sysconfig/network-scripts )
Reconfiguring/restart network and CRS
[root@grac41
network-scripts]# cat  ifcfg-eth2
HWADDR=08:00:27:89:E9:A2
IPADDR=192.168.2.101
NAME=eth2
[root@grac41
network-scripts]# cat  ifcfg-eth1 
IPADDR=192.168.1.101
NAME=eth1
HWADDR=08:00:27:6B:E2:BD
After
changing HWaddr to follow the above ifconfig output the network looks good
[root@grac41
network-scripts] service network restart
[root@grac41
network-scripts]# ifconfig | egrep 'HWaddr|inet
addr'
eth1     
Link encap:Ethernet  HWaddr 08:00:27:89:E9:A2  
         
inet addr:192.168.1.101 
Bcast:192.168.1.255  Mask:255.255.255.0
eth2     
Link encap:Ethernet  HWaddr 08:00:27:6B:E2:BD  
         
inet addr:192.168.2.101 
Bcast:192.168.2.255  Mask:255.255.255.0
Restart CRS
[root@grac41
network-scripts]# crsctl stop crs -f
[root@grac41
network-scripts]# crsctl start crs
[root@grac41
network-scripts]# crsctl check cluster -all
**************************************************************
grac41:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
grac42:
CRS-4537:
Cluster Ready Services is online
CRS-4529:
Cluster Synchronization Services is online
CRS-4533:
Event Manager is online
**************************************************************
grac43:
CRS-4537:
Cluster Ready Services is online
CRS-4529:
Cluster Synchronization Services is online
CRS-4533:
Event Manager is online
**************************************************************
Lessons learned 
 - Verify carefully that IP addresses and Network Device names
are clusterwide  consistent 
[root@gract1
Desktop]# crsi
***** 
Local Resources: *****
Resource
NAME              
INST   TARGET      
STATE       
SERVER          STATE_DETAILS
---------------------------
----   ------------ ------------ ---------------
-----------------------------------------
ora.asm                       
1   ONLINE      
OFFLINE      -              
STABLE
ora.cluster_interconnect.haip 
1   ONLINE      
OFFLINE     
-              
STABLE
ora.crf                       
1   ONLINE      
OFFLINE      -              
STABLE
ora.crsd                      
1   ONLINE      
OFFLINE      -              
STABLE
ora.cssd                      
1   ONLINE      
OFFLINE      -              
STABLE
ora.cssdmonitor               
1   OFFLINE     
OFFLINE     
-              
STABLE
ora.ctssd                     
1   ONLINE      
OFFLINE      -              
STABLE
ora.diskmon                   
1   OFFLINE     
OFFLINE     
-              
STABLE
ora.drivers.acfs              
1   ONLINE      
ONLINE      
gract1          STABLE
ora.evmd                      
1   ONLINE      
OFFLINE     
gract1          STARTING
ora.gipcd                     
1   ONLINE      
OFFLINE     
-              
STABLE
ora.gpnpd                     
1   ONLINE      
OFFLINE      -              
STABLE
ora.mdnsd                     
1   ONLINE      
OFFLINE     
gract1          STARTING
ora.storage                   
1   ONLINE      
OFFLINE      -              
STABLE
Related client trace 
2014-08-22
10:57:07.750: [  OCRMSG][2296473152]prom_waitconnect: CONN NOT ESTABLISHED
(0,29,1,2)
2014-08-22
10:57:07.750: [  OCRMSG][2296473152]GIPC error
[29] msg [gipcretConnectionRefused]
2014-08-22
10:57:07.750: [  OCRMSG][2296473152]prom_connect: error while waiting for
connection complete [24]
2014-08-22
10:57:07.821: [  OCRMSG][2296473152]prom_waitconnect: CONN NOT ESTABLISHED
(0,29,1,2)
2014-08-22
10:57:07.821: [  OCRMSG][2296473152]GIPC error [29] msg
[gipcretConnectionRefused]
2014-08-22
10:57:07.821: [  OCRMSG][2296473152]prom_connect: error while waiting for
connection complete [24]
Root Cause : File system full : 100% - No traces can be written 
#
df -k
Filesystem          
1K-blocks      Used Available Use% Mounted on
/dev/mapper/vg_oel64-lv_root
                     
39603624  37798864         0 100% /
tmpfs                 
4194304       272   4194032  
1% /dev/shm
/dev/sda1              
495844    101751    368493  22% /boot
***** 
Cluster Resources: *****
Resource
NAME              
INST   TARGET  
 STATE       
SERVER          STATE_DETAILS
---------------------------
----   ------------ ------------ ---------------
-----------------------------------------
ora.asm                       
1   ONLINE    OFFLINE     
-              
STABLE
ora.cluster_interconnect.haip 
1   ONLINE    OFFLINE     
-              
STABLE
ora.crf                       
1   ONLINE    OFFLINE     
-              
STABLE
ora.crsd                      
1   ONLINE    OFFLINE     
-              
STABLE
ora.cssd                      
1   ONLINE    OFFLINE     
-              
STABLE
ora.cssdmonitor               
1   ONLINE  
 ONLINE       gract2         
STABLE
ora.ctssd                     
1   ONLINE    OFFLINE     
-              
STABLE
ora.diskmon                   
1   OFFLINE    OFFLINE     
-              
STABLE
ora.drivers.acfs              
1   ONLINE  
 ONLINE      
gract2          STABLE
ora.evmd                      
1   ONLINE    INTERMEDIATE
gract2          STABLE
ora.gipcd                     
1   ONLINE  
 ONLINE      
gract2          STABLE
ora.gpnpd                     
1   ONLINE  
 ONLINE      
gract2          STABLE
ora.mdnsd                     
1   ONLINE  
 ONLINE      
gract2          STABLE
ora.storage                   
1   ONLINE    OFFLINE     
-              
STABLE
--> CSSD doesn't become ONLINE 
Client log :
014-08-23
11:49:21.920: [  OCRMSG][2580342528]GIPC error
[29] msg [gipcretConnectionRefused]
2014-08-23
11:49:42.948: [  OCRMSG][2580342528]GIPC error [29] msg
[gipcretConnectionRefused]
2014-08-23
11:50:10.978: [  OCRMSG][2580342528]GIPC error [29] msg
[gipcretConnectionRefused]
2014-08-23
11:50:46.008: [  OCRMSG][2580342528]GIPC error [29] msg
[gipcretConnectionRefused]
2014-08-23
11:51:28.042: [  OCRMSG][2580342528]GIPC error [29] msg
[gipcretConnectionRefused]
2014-08-23
11:51:28.042: [  OCRMSG][2580342528]GIPC error [29] msg
[gipcretConnectionRefused]
20665
<... connect resumed>
)           = 0
20665
connect(66, {sa_family=AF_FILE,
path="/var/tmp/.oracle/sOHASD_UI_SOCKET"}, 110 <unfinished ...>
20665
<... connect resumed> )          
= 0
20665
connect(73, {sa_family=AF_FILE,
path="/var/tmp/.oracle/sprocr_local_conn_0_PROC"}, 110 <unfinished
...>
20665
<... connect resumed>
)           = -1 ECONNREFUSED
(Connection refused)
occsd.log :
2014-08-23
12:32:58.427: [    CSSD][1279260416]clssnmvDHBValidateNCopy:
node 1, gract1, has a disk HB, but no network HB,
         DHB has rcfg 304252836, wrtcnt,
3207223, LATS 4294823390, lastSeqNo 3207220, uniqueness 1408783210, timestamp
1408789980/5988764
2014-08-23
12:32:58.427: [    CSSD][1283991296]clssnmvDHBValidateNCopy:
node 1, gract1, has a disk HB, but no network HB,
         DHB has rcfg 304252836, wrtcnt,
3207224, LATS 4294823390, lastSeqNo 3207221, uniqueness 1408783210, timestamp
1408789980/5988864
- Fix : Disable Firewall
Problem : Firewall not disabled on OEL 6 after running chkconfig iptables off and system reboot
 
To fix the problem you need to disable libvirtd 
# chkconfig libvirtd off
# chkconfig libvirt-guests off
# chkconfig ip6tables off
# chkconfig iptables off
#  chkconfig --list  | egrep 'iptables|ip6tables|libvirt'
ip6tables 0:off 1:off 2:off 3:off 4:off 5:off 6:off
iptables 0:off 1:off 2:off 3:off 4:off 5:off 6:off
libvirt-guests 0:off 1:off 2:off 3:off 4:off 5:off 6:off
libvirtd 0:off 1:off 2:off 3:off 4:off 5:off 6:off
After a reboot  the firewall should be disabled now
[root@grac43 ~]# service iptables status
iptables: Firewall is not running.
References
- Grid Infrastructure Redundant Interconnect and ora.cluster_interconnect.haip (Doc ID 1210883.1)
- Grid Infrastructure Installation root.sh Failed with “Failed to start CTSS” (Doc ID 1277307.1)
- Troubleshoot Grid Infrastructure Startup Issues (Doc ID 1050908.1)
- Top 5 Grid Infrastructure Startup Issues (Doc ID 1368382.1)
One
thought on “CRS does not start GIPC error: [29] msg [gipcretConnectionRefused]”
