Search

Tuesday, December 11, 2012

Recover archivelog gap between primary to standby using RMAN incremental backups

A Physical Standby database synchs with Primary by continuous apply of archive logs from a Primary Database. In case of an archive log gone missing or corrupt, We have to bring back the standby to sync with the primary.

When the logs missing or corrupt is less in number (say below 15), we can ship the logs which were missing in the standby site from the primary site (scp/sftp/ftp) and then we can register the log file in the standby so that the gap can be resolved.

Find the archives which are missing by issueing the following command.
SQL> select * from v$archive_gap;
This would give the gap sequences. Or you can use the v$managed_standby view to find where the log apply stuck.

SQL> select sequence#,process,status from v$managed_standby;
Here u can see status as wait for log for say sequence# 100 but your primary would've proceeded to sequence# 110

At primary
SQL> select max(sequence#) from v$archived_log;      ---> This would show you 110

Copy the logs to the standby site from the primary site
$ scp log_file_name_n.arc oracle@standby:/log/file/location/log_file_name_n.arc

At standby site
SQL> alter database register logfile '/log/file/location/log_file_name_n.arc';
logfile registered
Do the log file registration at the standby site until all the missing log files are registered. Now apply would take place and your standby will become sync with the primary.
This is easy process if you have missing or corrupt logs in lesser number. But when the difference is huge (say around 500 logs) this method is very time consuming and not a proper approach.

Else you have to rebuild the standby database from scratch.
As an enhancement from 10g, an incremental backup created with BACKUP INCREMENTAL... FROM SCN can be used to refresh the standby database with changes at the primary database since the last SCN at Standby and then managed recovery can resume i.e. Compensate for the missing archive logs.

Let us see the steps involved.
Step 1: On the primary:
SQL> select current_scn from v$database;
CURRENT_SCN
-----------
144710998

On the standby:
SQL> select current_scn from v$database;
CURRENT_SCN
-----------
130158742

Clearly there is a difference. But this by itself does not indicate a problem; since the standby is expected to lag behind the primary (this is an asynchronous non-real time apply setup). The real question is how much it is lagging in the terms of wall clock.

To know that use the scn_to_timestamp function to translate the SCN to a timestamp:
SQL> select scn_to_timestamp(144710998) from dual;
SCN_TO_TIMESTAMP(1447102)
-------------------------------
18-AUG-11 08.54.28.000000000 AM

Run the same query to know the timestamp associated with the SCN of the standby database as well
SQL> select scn_to_timestamp(130158742) from dual;
SCN_TO_TIMESTAMP(1301571)
-------------------------------
13-AUG-11 07.19.27.000000000 PM
Note: Run it on the primary database, since it will fail in the standby in a mounted mode
 This shows that the standby is four and half days lagging!

Step 2: [Standby] Stop the managed standby apply process:
SQL> alter database recover managed standby database cancel;
Database altered.

Step 3: [Standby] Shutdown the standby database
SQL> shut immediate

Step 4: [Primary] On the primary, take an incremental backup from the SCN number where the standby has been stuck:
RMAN> run {
2> allocate channel c1 type disk format '/u01/backup/%U.bkp';
3> backup incremental from scn 130158740 database;
4> }

Step 5: [Primary] On the primary, create a new standby controlfile:
SQL> alter database create standby controlfile as '/u01/backup/for_standby.ctl';
Database altered.

Step 6: [Primary] Copy these files to standby host:
oracle@dba1 /u01/backup]$ scp * oracle@dba2:/u01/backup

Step 7: [Standby] Bring up the instance in nomount mode:
SQL> startup nomount

Step 8: [Standby] Check the location of the controlfile:
SQL> show parameter control_files
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
control_files string /u01/oradata/standby_cntfile.ctl

Step 9: [Standby] Replace the controlfile with the one you just created in primary.
 $ cp /u01/backup/for_standby.ctl /u01/oradata/standby_cntfile.ctl

Step 10: [Standby] Mount the standby database:
SQL> alter database mount standby database;

Step 11: [Standby] Connect to RMAN. RMAN does not know about these files yet; so you must let it know – by a process called cataloging. Catalog these files:
$ rman target=/
RMAN> catalog start with '/u01/backup';

Step 12: Recover these files:
RMAN> recover database;

Step 13: After some time, the recovery fails with the message:
archive log filename=/u01/oradata/1_18108_697108460.dbf thread=1 sequence=18109
ORA-00310: archived log contains sequence 18108; sequence 18109 required
This happens because we have come to the last of the archived logs. The expected archived log with sequence# 18108 has not been generated yet.

Step 14: At this point exit RMAN and start managed recovery process:
SQL> alter database recover managed standby database disconnect from session;
Database altered.

Step 15: Check the SCN’s in primary and standby:
[Standby] SQL> select current_scn from v$database;
CURRENT_SCN
-----------
144747125
[Primary] SQL> select current_scn from v$database;
CURRENT_SCN
-----------
144747111
Now they are very close to each other. The standby has now caught up.

Monday, November 19, 2012

crsctl - Cluster Ready Service Control



To Get help
$crsctl -h

To Get Active Cluster Version
$crsctl query crs activeversion

To Get cluster version of given node
$crsctl query crs softwareversion [node_name]

Ex:$crsctl query crs softwareversion rac1

#crsctl start crs
#crsctl stop crs
(or)
#/etc/init.d/init.crs start
#/etc/init.d/init.crs stop

#crsctl enable crs
#crsctl disable crs
(or)
#/etc/init.d/init.crs enable
#/etc/init.d/init.crs disable

To get cluster stack health status
$crsctl check crs

To get the viability of CSS across nodes
$crsctl check cluster [-node node_name]   -- 11gR2 RACcommand

#crsctl start cluster -n HostName -- 11g R2
#crsctl stop cluster -n HostName -- 11g R2
#crsctl stop cluster -all  -- 11g R2

$ps -ef | grep d.bin
$crsctl check cssd
$crsctl check crsd
$crsctl check evmd
$crsctl check oprocd
$crsctl check ctss

#/etc/init.d/init.cssd stop
#/etc/init.d/init.cssd start

#/etc/rc.d/init.d/init.evmd
#/etc/rc.d/init.d/init.cssd
#/etc/rc.d/init.d/init.crsd

#mv /etc/rc3.d/S96init.cssd /etc/rc3.d/_S96init.cssd -- to stop cssd from autostarting after reboot

#crsctl check css votedisk
#crsctl query css votedisk -- lists the voting disks used by CSS command in 11gR2

#crsctl add css votedisk PATH
#crsctl add css votedisk PATH -force -- if Clusterware is not running
#crsctl delete css votedisk PATH
#crsctl delete css votedisk PATH -force -- if Clusterware is not running

#crsctl set css parameter_name value -- set parameters on OCR
#crsctl set css misscount 100
#crsctl unset css parameter_name -- sets CSS parameter to its default
#crsctl unset css misscount
#crsctl get css parameter_name -- gets the value of a CSS parameter
#crsctl get css disktimeout
#crsctl get css misscount
#crsctl get css reboottime

#crsctl start resources -- starts Clusterware resources
#crsctl start resource ora.DATA.dg
#crsctl stop resources -- stops Clusterware resources

$crsctl status resource
$crsctl status resource -t
$crsctl stat resource -t
#crsctl lsmodules crs -- lists CRS modules that can be used for debugging( need root privilage)
List CRSD Debug Module: AGENT
List CRSD Debug Module: AGFW
List CRSD Debug Module: CLSFRAME
List CRSD Debug Module: CLSVER
List CRSD Debug Module: CLUCLS
List CRSD Debug Module: COMMCRS
List CRSD Debug Module: COMMNS
List CRSD Debug Module: CRSAPP
List CRSD Debug Module: CRSCCL
List CRSD Debug Module: CRSCEVT
List CRSD Debug Module: CRSCOMM
List CRSD Debug Module: CRSD
List CRSD Debug Module: CRSEVT
List CRSD Debug Module: CRSMAIN
List CRSD Debug Module: CRSOCR
List CRSD Debug Module: CRSPE
List CRSD Debug Module: CRSPLACE
List CRSD Debug Module: CRSRES
List CRSD Debug Module: CRSRPT
List CRSD Debug Module: CRSRTI
List CRSD Debug Module: CRSSE
List CRSD Debug Module: CRSSEC
List CRSD Debug Module: CRSTIMER
List CRSD Debug Module: CRSUI
List CRSD Debug Module: CSSCLNT
List CRSD Debug Module: OCRAPI
List CRSD Debug Module: OCRASM
List CRSD Debug Module: OCRCAC
List CRSD Debug Module: OCRCLI
List CRSD Debug Module: OCRMAS
List CRSD Debug Module: OCRMSG
List CRSD Debug Module: OCROSD
List CRSD Debug Module: OCRRAW
List CRSD Debug Module: OCRSRV
List CRSD Debug Module: OCRUTL
List CRSD Debug Module: SuiteTes
List CRSD Debug Module: UiServer

$crsctl lsmodules css -- lists CSS modules that can be used for debugging
The following are the Cluster Synchronization Services modules::
    CSSD
    COMMCRS
    COMMNS
    CLSF
    SKGFD

$crsctl lsmodules evm -- lists EVM modules that can be used for debugging
The following are the Cluster Synchronization Services modules::
    CSSD
    COMMCRS
    COMMNS
    CLSF
    SKGFD

$crsctl start has   (HAS - High Availability Services)
$crsctl stop has

$crsctl check has
CRS-4638 Oracle High Availability Service is online

OCR Modules -- cannot be listed with crsctl lsmodules command
OCRAPI
OCRCLI
OCRSRV
OCRMAS
OCRMSG
OCRCAC
OCRRAW
OCRUTL
OCROSD

#crsctl debug statedump crs -- dumps state info for crs objects
#crsctl debug statedump css -- dumps state info for css objects
#crsctl debug statedump evm -- dumps state info for evm objects

#crsctl debug log crs [module:level]{,module:level} ...

-- Turns on debugging for CRS
#crsctl debug log crs CRSEVT:5,CRSAPP:5,CRSTIMER:5,CRSRES:5,CRSRTI:1,CRSCOMM:2
#crsctl debug log css [module:level]{,module:level} ...

-- Turns on debugging for CSS
#crsctl debug log css CSSD:1
#crsctl debug log evm [module:level]{,module:level} ...
-- Turns on debugging for EVM
#crsctl debug log evm EVMCOMM:1

#crsctl debug trace crs -- dumps CRS in-memory tracing cache
#crsctl debug trace css -- dumps CSS in-memory tracing cache
#crsctl debug trace evm -- dumps EVM in-memory tracing cache

#crsctl debug log res resource_name:level -- turns on debugging for resources
#crsctl debug log res "ora.lnx04.vip:1"

#crsctl trace all_the_above_commands -- tracing by adding a "trace" argument.
#crsctl trace check css
#crsctl backup -h
#crsctl backup css votedisk

Here is the list of the options for CRSCTL in 11gR2:
       crsctl add       - add a resource, type or other entity
       crsctl backup    - back up voting disk for CSS
       crsctl check     - check a service, resource or other entity
       crsctl config    - output autostart configuration
       crsctl debug     - obtain or modify debug state
       crsctl delete    - delete a resource, type or other entity
       crsctl disable   - disable autostart
       crsctl discover  - discover DHCP server
       crsctl enable    - enable autostart
       crsctl get       - get an entity value
       crsctl getperm   - get entity permissions
       crsctl lsmodules - list debug modules
       crsctl modify    - modify a resource, type or other entity
       crsctl query     - query service state
       crsctl pin       - Pin the nodes in the nodelist
       crsctl relocate  - relocate a resource, server or other entity
       crsctl replace   - replaces the location of voting files
       crsctl release   - release a DHCP lease
       crsctl request   - request a DHCP lease
       crsctl setperm   - set entity permissions
       crsctl set       - set an entity value
       crsctl start     - start a resource, server or other entity
       crsctl status    - get status of a resource or other entity
       crsctl stop      - stop a resource, server or other entity
       crsctl unpin     - unpin the nodes in the nodelist
       crsctl unset     - unset a entity value, restoring its default

How do I identify the voting disk/file location?
#crsctl query css votedisk

How to take backup of voting file/disk?
crsctl backup css votedisk

Initialization parameters in Oracle RAC 11gR2

SPECIAL parameters in RAC:
instance_number
instance_group
thread
cluster_database
cluster_database_instances
cluster_interconnects
remote_listener
local_listener
parallel_instance
max_commit_propagation_delay

Parameters with SAME VALUE across all instances:
active_instance_count
archive_lag_target
compatible
cluster_database
cluster_database_instances
cluster_interconnects      
control_files
db_block_size
db_domain
db_files
db_name
db_recovery_file_dest
db_recovery_file_dest_size
db_unique_name
dml_locks                     -- when 0
instance_type                -- rdbms or asm
max_commit_propagation_delay
parallel_max_servers
remote_login_password_file
trace_enabled
undo_management

For example,
*.cluster_database=true
*.cluster_database_instances=2
*.compatible='11.2.0.2'
*.undo_management='AUTO'
 
Parameters with UNIQUE Values across all instances:
instance_number
instance_name
thread
undo_tablespace/rollback_segments

For example,
INST1.instance_number=1
INST1.instance_name=rac1
INST1.thread=1
INST1.undo_tablespace='UNDOTBS1'

INST2.instance_number=2
INST2.instance_name=rac2
INST2.thread=2
INST2.undo_tablespace='UNDOTBS2'

OCRCHECK Command in 11gR2 RAC

$ ocrcheck -h
Name:
        ocrcheck - Displays health of Oracle Cluster/Local Registry.

Synopsis:
        ocrcheck [-config] [-local]

  -config       Displays the configured locations of the Oracle Cluster Registry.
                This can be used with the -local option to display the configured
                location of the Oracle Local Registry
  -local        The operation will be performed on the Oracle Local Registry.

Notes:
        A log file will be created in
        $ORACLE_HOME/log/<hostname>/client/ocrcheck_<pid>.log.
        File creation privileges in the above directory are needed
        when running this tool.

$ ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          3
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       2736
         Available space (kbytes) :     259384
         ID                       :   33615009
         Device/File Name         :      +DATA
                                    Device/File integrity check succeeded
                                    Device/File not configured
                                    Device/File not configured
                                    Device/File not configured
                                    Device/File not configured

         Cluster registry integrity check succeeded
         Logical corruption check succeeded

$ocrcheck -local
Status of Oracle Local Registry is as follows :
         Version                  :          3
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       2172
         Available space (kbytes) :     259948
         ID                       : 1558172427
         Device/File Name         : /u01/app/11.2.0/grid/cdata/rac3.olr
                                    Device/File integrity check succeeded

         Local registry integrity check succeeded
         Logical corruption check succeeded

$ ocrcheck -config
Oracle Cluster Registry configuration is :
         Device/File Name         :      +DATA

$ ocrcheck -local -config
Oracle Local Registry configuration is :
         Device/File Name         : /u01/app/11.2.0/grid/cdata/rac3.olr

NOTE:  Debugging can be controlled through $CRS_HOME/srvm/admin/ocrlog.in

$ ocrcheck -local    ---(need root permission)
PROTL-602: Failed to retrieve data from the local registry
PROCL-26: Error while accessing the physical storage Operating System error [Permission denied] [13]

# ocrcheck -local
Status of Oracle Local Registry is as follows :
         Version                  :          3
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       2172
         Available space (kbytes) :     259948
         ID                       : 1558172427
         Device/File Name         : /u01/app/11.2.0/grid/cdata/rak3.olr
                                    Device/File integrity check succeeded

         Local registry integrity check succeeded
         Logical corruption check succeeded