Search

Friday, November 9, 2012

11gR2 - Restore loss of CRS Diskgroup OCR/ VOTEDISK in ASM environment



1. Locate the latest automatic OCR backup

When using a non-shared CRS home, automatic OCR backups can be located on any node of the cluster, consequently all nodes need to be checked for the most recent backup:

$ ls -lrt $CRS_HOME/cdata/rac_cluster1/
-rw------- 1 root root 7331840 Mar 10 18:52 week.ocr
-rw------- 1 root root 7651328 Mar 26 01:33 week_.ocr
-rw------- 1 root root 7651328 Mar 29 01:33 day.ocr
-rw------- 1 root root 7651328 Mar 30 01:33 day_.ocr
-rw------- 1 root root 7651328 Mar 30 01:33 backup02.ocr
-rw------- 1 root root 7651328 Mar 30 05:33 backup01.ocr
-rw------- 1 root root 7651328 Mar 30 09:33 backup00.ocr


2. Make sure the Grid Infrastructure is shutdown on all nodes
Given that the OCR diskgroup is missing, the GI stack will not be functional on any node, however there may still be various daemon processes running.  On each node shutdown the GI stack using the force (-f) option:

# $CRS_HOME/bin/crsctl stop crs -f


3. Start the CRS stack in exclusive mode

On the node that has the most recent OCR backup, log on as root and start CRS in exclusive mode, this mode will allow ASM to start & stay up without the presence of a Voting disk and without the CRS daemon process (crsd.bin) running.

11.2.0.1:

# $CRS_HOME/bin/crsctl start crs -excl
...CRS-2672: Attempting to start 'ora.asm' on 'racnode1'
CRS-2676: Start of 'ora.asm' on 'racnode1' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'racnode1'
CRS-2676: Start of 'ora.crsd' on 'racnode1' succeeded

11.2.0.2:

# $CRS_HOME/bin/crsctl start crs -excl -nocrs
CRS-4123: Oracle High Availability Services has been started.
...
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'auw2k3'
CRS-2672: Attempting to start 'ora.ctssd' on 'racnode1'
CRS-2676: Start of 'ora.drivers.acfs' on 'racnode1' succeeded
CRS-2676: Start of 'ora.ctssd' on 'racnode1' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'racnode1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'racnode1'
CRS-2676: Start of 'ora.asm' on 'racnode1' succeeded


4. Label the CRS disk for ASMLIB use

If using ASMLIB the disk to be used for the CRS disk group needs to stamped first, as user root do:

# /usr/sbin/oracleasm createdisk ASMD40 /dev/sdh1

Writing disk header: done
Instantiating disk: done


5. Create the CRS diskgroup via sqlplus

The disk group can now be (re-)created via sqlplus from the grid user. The compatible.asm attribute must be set to 11.2 in order for the disk group to be used by CRS: 

$ sqlplus / as sysasm
SQL*Plus: Release 11.2.0.1.0 Production on Tue Mar 30 11:47:24 2010
Copyright (c) 1982, 2009, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Real Application Clusters and Automatic Storage Management options

SQL> create diskgroup CRS external redundancy disk 'ORCL:ASMD40' attribute 'COMPATIBLE.ASM' = '11.2';

Diskgroup created.

SQL> exit


6. Restore the latest OCR backup

Now that the CRS disk group is created & mounted the OCR can be restored - must be done as the root user:

# cd $CRS_HOME/cdata/rac_cluster1/
# $CRS_HOME/bin/ocrconfig -restore backup00.ocr


7. Start the CRS daemon on the current node (11.2.0.1 only !)

Now that the OCR has been restored the CRS daemon can be started, this is needed to recreate the Voting file. Skip this step for 11.2.0.2.0.

# $CRS_HOME/bin/crsctl start res ora.crsd -init
CRS-2672: Attempting to start 'ora.crsd' on 'racnode1'
CRS-2676: Start of 'ora.crsd' on 'racnode1' succeeded


8. Recreate the Voting file

The Voting file needs to be initialized in the CRS disk group:

# $CRS_HOME/bin/crsctl replace votedisk +CRS
Successful addition of voting disk 00caa5b9c0f54f3abf5bd2a2609f09a9.
Successfully replaced voting disk group with +CRS.
CRS-4266: Voting file(s) successfully replaced


9. Recreate the SPFILE for ASM (optional)

Prepare a pfile (e.g. /tmp/asm_pfile.ora) with the ASM startup parameters - these may vary from the example below. If in doubt consult the ASM alert log  as the ASM instance startup should list all non-default parameter values. Please note the last startup of ASM (in step 2 via CRS start) will not have used an SPFILE, so a startup prior to the loss of the CRS disk group would need to be located.

*.asm_power_limit=1
*.diagnostic_dest='/u01/app/oragrid'
*.instance_type='asm'
*.large_pool_size=12M
*.remote_login_passwordfile='EXCLUSIVE'

Now the SPFILE can be created using this PFILE:

$ sqlplus / as sysasm
SQL*Plus: Release 11.2.0.1.0 Production on Tue Mar 30 11:52:39 2010
Copyright (c) 1982, 2009, Oracle. All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Real Application Clusters and Automatic Storage Management options

SQL> create spfile='+CRS' from pfile='/tmp/asm_pfile.ora';

File created.

SQL> exit


10. Shutdown CRS 

Since CRS is running in exclusive mode, it needs to be shutdown  to allow CRS to run on all nodes again. Use of the force (-f) option may be required:

# $CRS_HOME/bin/crsctl stop crs -f
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'auw2k3' has completed
CRS-4133: Oracle High Availability Services has been stopped.


11. Rescan ASM disks

If using ASMLIB rescan all ASM disks on each node as the root user:

# /usr/sbin/oracleasm scandisks
Reloading disk partitions: done
Cleaning any stale ASM disks...
Scanning system for ASM disks...
Instantiating disk "ASMD40"


12. Start CRS 
As the root user submit the CRS startup on all cluster nodes:

# $CRS_HOME/bin/crsctl start crs
CRS-4123: Oracle High Availability Services has been started.


13. Verify CRS 

To verify that CRS is fully functional again:
# $CRS_HOME/bin/crsctl check cluster -all
**************************************************************
racnode1:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
racnode2:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************



Thursday, November 8, 2012

ASM 11gr2 Command

Check for free space in ASM disks

---------------------------

set lines 255
col path for a35
col Diskgroup for a15
col DiskName for a20
col disk# for 999
col total_mb for 999,999,999
col free_mb for 999,999,999
compute sum of total_mb on DiskGroup
compute sum of free_mb on DiskGroup
break on DiskGroup skip 1 on report -

set pages 255

select a.name DiskGroup, b.disk_number Disk#, b.name DiskName, b.total_mb, b.free_mb, b.path, b.header_status
from v$asm_disk b, v$asm_diskgroup a
where a.group_number (+) =b.group_number
order by b.group_number, b.disk_number, b.name
/
---------------------------------------

ASM Parameters

*.db_cache_size=64m
*.large_pool_size=12M
*.shared_pool_size=128M
*.processes=300
The equation for the process parameters is:
Processes   =25 +                          (10 + max number of concurrent database file creations, ,                                                                  file extend operations possible) * n.

Where n is the number of databases connecting to ASM.

Check for space on your ASM instance



Here’s a simple script to see how much space that you have at the disk and disk group level. 

This script should work on all the operating systems but only tested on Linux.

 The only portion that you will have to change is the ‘ps -ef’ line

 -----------------------
export DB=$(ps -ef |grep +ASM |grep -i pmon |awk {'print $8'} |sed -e 's/asm_pmon_//g')
export ORACLE_SID=${DB}
export ORAENV_ASK=NO
. oraenv
sqlplus -s / as sysasm <<!!
col name for a15
col path for a20
set lines 122 pages 66
col AU for 9 hea 'AU|MB'
col state for a12
col compatibility for a10 hea 'ASM|Compat'
col database_compatibility for a10 hea 'Database|Compat'
col pct_Free for 99.99 head 'Pct|Free'
col block_size for 99,999 head 'Block|Size'
col Total_GB for 999,999.99 head 'Total|GB'
col Free_GB for 999,999.99 head 'Free|GB'
col pct_free for 999 hea 'Pct|Free'
select name, path, total_mb, free_mb,
       round(free_mb/total_mb*100,2) pct_Free
from v\$asm_disk
where total_mb >1
order by name;
select name, state, round(total_mb/1024,2) Total_GB, round(free_mb/1024,2) Free_GB,
       round(free_mb/total_mb*100,2) pct_Free,
       allocation_unit_size/1024/1024 AU, compatibility, database_compatibility
from v\$asm_diskgroup
where total_mb > 1;
!!
----------------------

 Following script to check the balance of the data across the disks:
set pages 9999 lines 200
column name format a40
select a.name, b.disk_kffxp disk, count(disk_kffxp) blocks
from
v$asm_alias a
, x$kffxp b
, v$asm_file c
where
a.group_number=b.group_kffxp
and a.group_number=c.group_number
and a.file_number=c.file_number
and a.file_number=b.number_kffxp
and c.type in ('DATAFILE','TEMPFILE','ONLINELOG')
group by a.name, b.disk_kffxp
order by a.name, count(disk_kffxp) desc;