Search

Wednesday, August 10, 2016

Configure Dataguard Broker

Configure Dataguard Broker

In this article I would discuss the high level steps required to configure Dataguard broker and services to support transparent application failover. I am assuming that reader is aware of data guard environment and also knows how to perform role switch overs to minimize the downtime.  Without Dataguard broker one has to perform series of steps manually to perform role switch over which becomes cumbersome and subjected to human errors which could lead to unwanted downtime. Dataguard broker makes life easy and allows us to perform switch overs in single command. Moreover you can also enable fast start failover by configuring observer on any other server which would perform role switch over for you automatically if primary node fails due to some reason.

Lets discuss first what it takes to perform role switch overs without Dataguard Broker Configuration:
You need to first create service and trigger using below steps to support application failover:

exec DBMS_SERVICE.CREATE_SERVICE -
(service_name => 'pharmatest', -
network_name => 'pharmatest', -
aq_ha_notifications => true, -
failover_type => 'SELECT', -
failover_retries => 180, -
failover_delay => 1);

Create trigger to make sure that service starts only on primary database after role switch overs.

create or replace trigger start_database_service after startup on database
declare
role varchar2(50);
omode varchar2(50);
begin
select database_role into role from v$database;
  if role='PRIMARY' then
    dbms_service.start_service('pharmatest');
  end if;
end;
/

create entry for the service in tnsnames.ora file on application servers:
PHARMATEST=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)  (HOST=icmfg74pharma.am.xyz.com)(PORT=1531))(ADDRESS=(PROTOCOL=TCP)(HOST=icmfg57.am.xyz.com)(PORT=1530)))  (CONNECT_DATA=(FAILOVER_MODE=(TYPE=select)(METHOD=basic) (RETRIES=180)(DELAY=5))(SERVICE_NAME=PHARMATEST)))#

Performing a Role Transition using Switchover

A switchover allows a primary and standby to reverse roles without any data loss
and without any need to re-create the previous primary. In contrast, a failover
implies potential data loss and can result in the need for the old primary to be re-created.

Switchovers are normally performed for planned maintenance. For example, if the
primary host needed to replace a faulty CPU that required downtime, we could
 perform a switchover and have users automatically redirected to the new primary.
The impact to the user base could be greatly reduced, thus increasing our availability.

A switchover can be performed using either a physical or logical standby. However,
you should be aware of some issues. If you have a configuration with a primary
database, a physical standby, and a logical standby, and you perform a
switchover to the logical standby, your physical standby will no longer be a part
of the configuration and must be re-created. In the same scenario, if you perform
a switchover to the physical standby, the logical standby remains in the configuration
and does not need to be re-created.

For this reason, a physical standby is a better option for a switchover candidate than a
logical standby when multiple standby types exist in the configuration.

1.) Login to existing primary database and run below:
 select switchover_status from v$database;

2.) If step 1 returns value - 'TO STANDBY', run below command:
alter database commit to switchover to physical standby;
If the switchover status is reporting SESSIONS ACTIVE, we must use the session
shutdown clause:
 alter database commit to switchover to physical standby with
session shutdown;

3.) we must shut down and mount the former primary database:
shutdown immediate;
startup mount;
Time to celebrate; we now have two physical standby databases.
4.) During conversion of primary database to physical standby database, redo marker is generated and As soon as the standby receives and recovers that marker,it is eligible to become a primary database. Login to physical standby database and run below:
select switchover_status from v$database;
This should return 'PRIMARY' if redo marker stream is successfully received.
5.) alter database commit to switchover to primary;
6.) Start managed recovery of new physical standby database.

Configuring Dataguard Broker:

1.) Set the following parameters on Physical Standby database;
alter system set db_flashback_retention_target = 20160;
alter system set db_recovery_file_dest_size = 20G;
alter system set db_recovery_file_dest = '/exports/pmxexpdp/flash_recovery_area';
alter system set dg_broker_config_file1='/oracle/data/ora03/rec3p455/dr1rec3p455.dat' scope=spfile;
alter system set dg_broker_config_file2='/oracle/data/ora04/rec3p455/dr2rec3p455.dat' scope=spfile;
alter system set dg_broker_start=TRUE scope=both;
2.) set the following parameters on Primary database;
alter system set db_flashback_retention_target = 20160;
alter system set db_recovery_file_dest_size = 20G;
alter system set db_recovery_file_dest = '/oracle/data/ora02/rec2p455/flash_recovery_area';
alter system set dg_broker_config_file1='/oracle/data/ora03/rec2p455/dr1rec2p455.dat' scope=spfile;
alter system set dg_broker_config_file2='/oracle/data/ora02/rec2p455/dr2rec2p455.dat' scope=spfile;
alter system set dg_broker_start=TRUE scope=both;
3.) Add the entry in corresponding listener in listener.ora for each physical standby and primary databases - 
Physical Standby database:
#############################################################################
SID_LIST_LSNR_icmfg74pharma_1531= (SID_LIST=
 (SID_DESC= (GLOBAL_DBNAME = rec2p455_DGMGRL)(SID_NAME=rec2p455) (ORACLE_HOME=/oracle/product/root1) (SERVICE_NAME = rec2p455.localdomain))
#
LSNR_icmfg74pharma_1531= (ADDRESS_LIST=
 (ADDRESS= (PROTOCOL=tcp) (HOST=icmfg74pharma.am.xyz.com) (PORT=1531) (QUEUESIZE=512))
 (ADDRESS= (PROTOCOL=ipc) (KEY=EXTPROC1531))
)
TRACE_LEVEL_LSNR_icmfg74pharma_1531 = OFF
STARTUP_WAIT_TIME_LSNR_icmfg74pharma_1531 = 0
CONNECT_TIMEOUT_LSNR_icmfg74pharma_1531 = 10
#Enable dynamic resgitration for service rec2p455_dg that DMON process creates on the fly
DYNAMIC_REGISTRATION_LSNR_icmfg74pharma_1531 = ON
#
DIAG_ADR_ENABLED_LSNR_icmfg74pharma_1531=off
LOG_DIRECTORY_LSNR_icmfg74pharma_1531=/oracle/local/log
SID_LIST_LSNR_icmfg74pharma_1531= (SID_LIST=
 (SID_DESC= (GLOBAL_DBNAME = rec2p455_DGMGRL)(SID_NAME=rec2p455) (ORACLE_HOME=/oracle/product/root1) (SERVICE_NAME = rec2p455.localdomain))
)
###
Pirmary database:
###############################################################################
lsnr_icmfg57_DG_1530= (ADDRESS_LIST=
 (ADDRESS= (PROTOCOL=tcp) (HOST=icmfg57.am.xyz.com) (PORT=1530) (QUEUESIZE=512))
 (ADDRESS= (PROTOCOL=ipc) (KEY=EXTPROC1530))
 )
TRACE_LEVEL_lsnr_icmfg57_DG_1530 = OFF
DIAG_ADR_ENABLED_lsnr_icmfg57_DG_1530=OFF
STARTUP_WAIT_TIME_lsnr_icmfg57_DG_1530 = 0
# eliminate INBOUND WARNING msgs
INBOUND_CONNECT_TIMEOUT_lsnr_icmfg57_DG_1530 = 0
# 372959.1 10g error in log and causes lsnr to hang
# spawning duplicate listeners
SUBSCRIBE_FOR_NODE_DOWN_EVENT_lsnr_icmfg57_DG_1530=OFF
CONNECT_TIMEOUT_lsnr_icmfg57_DG_1530 = 10
DYNAMIC_REGISTRATION_lsnr_icmfg57_DG_1530 = ON
#
LOG_DIRECTORY_lsnr_icmfg57_DG_1530=/oracle/local/log
#
SID_LIST_lsnr_icmfg57_DG_1530= (SID_LIST=
 (SID_DESC= (GLOBAL_DBNAME = rec3p455_DGMGRL)(SID_NAME=rec3p455) (ORACLE_HOME=/oracle/product/root3) (SERVICE_NAME = rec3p455.localdomain))
)
###
4.) Reload the listeners for each primary and standby database.
lsnrctl reload
5.) Invoke DGMGRL and log into primary database - 
dgmgrl sys/pwd
create configuration rec2p455_dg as primary database is 'rec3p455' connect identifier is 'rec3p455';
add database 'rec2p455' as connect identifier is 'rec2p455' maintained as physical;
enable configuration;
edit database rec3p455 set property StaticConnectIdentifier='(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=icmfg57.am.xyz.com)(PORT=1530))(CONNECT_DATA=(SERVICE_NAME=rec3p455_DGMGRL)(SERVER=DEDICATED)))'
edit database rec2p455 set property StaticConnectIdentifier='(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=icmfg74pharma.am.xyz.com)(PORT=1531))(CONNECT_DATA=(SERVICE_NAME=rec2p455_DGMGRL)(SERVER=DEDICATED)))'
DGMGRL> show configuration
Configuration - rec2p455_dg
  Protection Mode: MaxPerformance
  Databases:
    rec3p455 - Primary database
    rec2p455 - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS
DGMGRL>
DGMGRL> show database verbose rec2p455
Database - rec2p455
  Role:            PHYSICAL STANDBY
  Intended State:  APPLY-ON
  Transport Lag:   0 seconds
  Apply Lag:       0 seconds
  Real Time Query: OFF
  Instance(s):
    rec2p455
  Properties:
    DGConnectIdentifier             = 'rec2p455'
    ObserverConnectIdentifier       = ''
    LogXptMode                      = 'ASYNC'
    DelayMins                       = '0'
    Binding                         = 'optional'
    MaxFailure                      = '0'
    MaxConnections                  = '1'
    ReopenSecs                      = '300'
    NetTimeout                      = '30'
    RedoCompression                 = 'DISABLE'
    LogShipping                     = 'ON'
    PreferredApplyInstance          = ''
    ApplyInstanceTimeout            = '0'
    ApplyParallel                   = 'AUTO'
    StandbyFileManagement           = 'AUTO'
    ArchiveLagTarget                = '0'
    LogArchiveMaxProcesses          = '4'
    LogArchiveMinSucceedDest        = '1'
    DbFileNameConvert               = '/oracle/data/ora01/rec2p455/, /oracle/data/ora01/rec3p455/, /oracle/data/ora01/rec3p455/, /oracle/data/ora01/rec2p455/, /oracle/data/ora02/rec2p455/, /oracle/data/ora02/rec3p455/, /oracle/data/ora02/rec3p455/, /oracle/data/ora02/rec2p455/, /oracle/data/ora03/rec2p455/, /oracle/data/ora03/rec3p455/, /oracle/data/ora03/rec3p455/, /oracle/data/ora03/rec2p455/, /oracle/data/ora04/rec2p455/, /oracle/data/ora04/rec3p455/, /oracle/data/ora04/rec3p455/, /oracle/data/ora04/rec2p455/'
    LogFileNameConvert              = '/oracle/data/redo01/rec2p455/, /oracle/data/ora01/rec3p455/, /oracle/data/ora01/rec3p455/, /oracle/data/redo01/rec2p455/, /oracle/data/redo02/rec2p455/, /oracle/data/ora02/rec3p455/, /oracle/data/ora02/rec3p455/, /oracle/data/redo02/rec2p455/'
    FastStartFailoverTarget         = ''
    InconsistentProperties          = '(monitor)'
    InconsistentLogXptProps         = '(monitor)'
    SendQEntries                    = '(monitor)'
    LogXptStatus                    = '(monitor)'
    RecvQEntries                    = '(monitor)'
    SidName                         = 'rec2p455'
    StaticConnectIdentifier         = '(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=icmfg74pharma.am.xyz.com)(PORT=1531))(CONNECT_DATA=(SERVICE_NAME=rec2p455_DGMGRL)(INSTANCE_NAME=rec2p455)(SERVER=DEDICATED)))'
    StandbyArchiveLocation          = '/oracle/data/arch02/rec2p455/'
    AlternateLocation               = ''
    LogArchiveTrace                 = '0'
    LogArchiveFormat                = 'rec2p455_%t_%s_%r.arc'
    TopWaitEvents                   = '(monitor)'
Database Status:
SUCCESS
DGMGRL>
DGMGRL> show database verbose rec3p455
Database - rec3p455
  Role:            PRIMARY
  Intended State:  TRANSPORT-ON
  Instance(s):
    rec3p455
  Properties:
    DGConnectIdentifier             = 'rec3p455'
    ObserverConnectIdentifier       = ''
    LogXptMode                      = 'ASYNC'
    DelayMins                       = '0'
    Binding                         = 'OPTIONAL'
    MaxFailure                      = '0'
    MaxConnections                  = '1'
    ReopenSecs                      = '300'
    NetTimeout                      = '30'
    RedoCompression                 = 'DISABLE'
    LogShipping                     = 'ON'
    PreferredApplyInstance          = ''
    ApplyInstanceTimeout            = '0'
    ApplyParallel                   = 'AUTO'
    StandbyFileManagement           = 'AUTO'
    ArchiveLagTarget                = '0'
    LogArchiveMaxProcesses          = '4'
    LogArchiveMinSucceedDest        = '1'
    DbFileNameConvert               = '/oracle/data/ora01/rec2p455/, /oracle/data/ora01/rec3p455/, /oracle/data/ora01/rec3p455/, /oracle/data/ora01/rec2p455/, /oracle/data/ora02/rec2p455/, /oracle/data/ora02/rec3p455/, /oracle/data/ora02/rec3p455/, /oracle/data/ora02/rec2p455/, /oracle/data/ora03/rec2p455/, /oracle/data/ora03/rec3p455/, /oracle/data/ora03/rec3p455/, /oracle/data/ora03/rec2p455/, /oracle/data/ora04/rec2p455/, /oracle/data/ora04/rec3p455/, /oracle/data/ora04/rec3p455/, /oracle/data/ora04/rec2p455/'
    LogFileNameConvert              = '/oracle/data/redo01/rec2p455/, /oracle/data/ora01/rec3p455/, /oracle/data/ora01/rec3p455/, /oracle/data/redo01/rec2p455/, /oracle/data/redo02/rec2p455/, /oracle/data/ora02/rec3p455/, /oracle/data/ora02/rec3p455/, /oracle/data/redo02/rec2p455/'
    FastStartFailoverTarget         = ''
    InconsistentProperties          = '(monitor)'
    InconsistentLogXptProps         = '(monitor)'
    SendQEntries                    = '(monitor)'
    LogXptStatus                    = '(monitor)'
    RecvQEntries                    = '(monitor)'
    SidName                         = 'rec3p455'
    StaticConnectIdentifier         = '(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=icmfg57.am.xyz.com)(PORT=1530))(CONNECT_DATA=(SERVICE_NAME=rec3p455_DGMGRL)(INSTANCE_NAME=rec3p455)(SERVER=DEDICATED)))'
    StandbyArchiveLocation          = '/oracle/data/arch01/rec3p455/'
    AlternateLocation               = ''
    LogArchiveTrace                 = '0'
    LogArchiveFormat                = 'rec3p455_%t_%s_%r.arc'
    TopWaitEvents                   = '(monitor)'
Database Status:
SUCCESS
DGMGRL>
For performing switchover, login to primary database using DGMGL and run below:
switchover to rec3p455;
Similary on physical standby run below:
switchover to rec2p455;
And Voila, you have successfully performed role switchovers using dataguard broker in single command.
One thing I forgot to mention earlier regarding dynamic registration of DG Broker internal service.
Every time you perform switchover, an internal static service with name dbname_DGB (rec3p455_DGB in our case) gets created and if dynamic registration is off, connection to this service will fail.
So, you should always keep DYNAMIC_REGISTRATION_lsnr_icmfg57_DG_1530 = ON otherwise switch overs will fail with below errors in alert log:
Fatal NI connect error 12514, connecting to:
(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(HOST=icmfg57.am.xyz.com)(PROTOCOL=TCP)(PORT=1541)))(CONNECT_DATA=(SERVICE_NAME=rec3p455_DGB)(INSTANCE_NAME=rec3p455)(CID=(PROGRAM=oracle)(HOST=icmfg74pharma)(USER=oracle))))

A big Thank You to my colleague DBA Anuj Kumar who helped me figuring this out while initially setting up DG Broker in my environment.