ORACLE DATABASE ADMINISTRATOR: 2016

Tuesday, August 16, 2016

Oracle Data Guard Version 11.2.0.3

General Information

Data Guard is a technology built into Enterprise Edition. It is not a download or an add-on and with the excpetion of Active Data Guard fully included, as of this writing, in the basic EE license.

The following was posted by DG Product Manager Larry Carpenter to the OTN forums on 1-Jun-2010. A small amount of editing has been done for consistency.

Maximum Protection and Maximum Availability are Data Guard's Zero Data Loss protection modes. The only difference is that in Maximum Protection Data Guard will abort the Primary (not allowing a transaction to be reported as committed to the application) if the redo cannot be written to a SYNC standby. In Maximum Availability it is allowed for a transaction to be committed even if there are no SYNC standby destinations available. That would be the 1st failure, the standby is not reachable. So when the Primary failed after that (the 2nd failure) you would of course lose data, the data that was generated while the standby was not reachable before the Primary failed.

However, this is not your situation. What you are experiencing is a misunderstanding of how the databases work.

In Oracle Versions prior to 11.2 the SYNC redo transport works as follows.

1. User says COMMIT.
2. LGWR writes the redo to the Online Redo Log.
3. The LNS processes grabs the same redo from the Log Buffer AFTER the LGWR has written the redo to the Online redo log file, and sends it to their standby.
4. When the SYNC standbys all report back that the redo has been received and written to the standby redo log file OR they fail to respond within NET_TIMEOUT seconds is the client told that the transaction has in fact committed.

What you are experiencing is the failure is happening between step 2 and step 3. The redo is written to the online redo log file at the primary but the client has not been told that the transaction has actually committed.

When you failover and do the select count(*) from the Primary you have the exact count of transactions that the client was actually told had been committed. When you then restarted the Primary what does it do? Crash recovery! Which will recover the redo that was actually written to the online redo log file which is that 'last' row which was the current transaction but was never actually 'committed' to the application. Your did not actually lose that row as the application was never told that it was committed.

In a real situation you would have failed over and then when the Primary came back you would have reinstated it as a standby and not opened it.

Try your test where your test application prints out the counter AFTER the commit has finished and then pull the plug, failover and see if the count in the table on the standby matches the counter in the application.

Final result, you didn't lose any data that the application was told was committed.

Larry

P.S. As of 11.2 the write to the Online Redo Log file by the LGWR and the send to the SYNC standby happens in parallel so you will see even less of what you saw when you try your test in the same way in 11.2)

Acronyms

Acronym	Short For
ARC0	Archiver Process on Local Server
ARCH	Archiver
ASYNC	Asynchronous
DMON	Data Guard Monitor
DR	Disaster Recovery
FAL	Fetch Archive Log
LCR	Logical Change Record
LGWR	Logwriter
LNS	Network Server Process
LSP	Logical Standby Process (SQL Apply for Logical DG)
MRP	Managed Recovery Process (Redo Apply for Physical DG)
RFS	Remote File Server
SYNC	Synchronous
TAF	Transparent Application Failover

Data Dictionary Objects for Physical DataGuard

DBA_LOG_GROUPS	DBA_HIST_LOG	GV$MANAGED_STANDBY
DBA_LOG_GROUP_COLUMNS	GV$DATAGUARD_CONFIG	V$DATABASE
DBA_REGISTERED_ARCHIVED_LOG	GV$DATAGUARD_STATUS	V$STANDBY_EVENT_HISTOGRAM

Data Dictionary Objects for Logical DataGuard

DBA_LOGSTDBY_EVENTS	DBA_LOGSTDBY_SKIP	GV$DATAGUARD_CONFIG
DBA_LOGSTDBY_HISTORY	DBA_LOGSTDBY_SKIP_TRANSACTION	GV$DATAGUARD_STATS
DBA_LOGSTDBY_LOG	DBA_LOGSTDBY_UNSUPPORTED	GV$DATAGUARD_STATUS
DBA_LOGSTDBY_NOT_UNIQUE	DBMS_DRS	GV$LOGSTDBY
DBA_LOGSTDBY_PARAMETERS	DBMS_LOGSTDBY	GV$LOGSTDBY_STATS
DBA_LOGSTDBY_PROGRESS	GV$ARCHIVE_GAP	GV$STANDBY_LOG

Docs

Doc Link	Data Type
Click Here	11g Data Guard Docs
Click Here	11g Data Guard - What's New
Click Here	Data Guard Protection Modes
Click Here	Switchover Role Transitions
Click Here	Fast start Failovers
Click Here	Data Guard Broker

Data Guard Broker

Start Data Guard Manager

$ DGMGRL

Add a standby database to the broker configuration

ADD DATABASE <database_name> [as connect identifier is <connect-identifier>];

DGMGRL> add database proda;

Connect to the database via the broker

CONNECT username[@connect-identifier]

DGMGRL> connect sys/syspwd
Connected.
Error:
ORA-16525: the Data Guard broker is not yet available
ORA-06512: at SYS:DBMS_DRS", line 124
ORA-06512: at line 1

SQL> ALTER SYSTEM SET dg_broker_start = TRUE;
System altered.

DGMGRL> connect sys/syspwd
Connected.
DGMGRL> exit

SQL> alter system set DG_BROKER_START = FALSE;

Convert or revert a standby between physical and snapshot

CONVERT <db_unique_name> TO <SNAPSHOT | PHYSICAL> STANDBY;

DGMGRL> CONVERT DATABASE prodb TO snapshot standby;

Create a broker configuration

CREATE CONFIGURATION <configuration-name> AS
PRIMARY DATABASE IS <database-name>
CONNECT IDENTIFIER IS <connect-identifier>;

DGMGRL> CREATE CONFIGURATION 'dg_prod_cfg' AS
PRIMARY DATABASE IS 'proda'
CONNECT IDENTIFIER IS proda.mlib.com;

Disable a configuration

DISABLE CONFIGURATION;

DGMGRL> DISABLE CONFIGURATION;

Disable a database

DISABLE DATABASE <database-name>;

DGMGRL> DISABLE DATABASE prodb;

Disable Fast-Start Failover

DISABLE FAST_START FAILOVER [FORCE];

DGMGRL> DISABLE FAST_START FAILOVER FORCE;

Disable Fast-Start Failover Condition

DISABLE FAST_START FAILOVER CONDITION <value>;

DGMGRL> DISABLE FAST_START FAILOVER CONDITION "corrupted controlfile";

Edit a configuration property

EDIT CONFIGURATION SET PROPERTY <property_name> = <value>;

DGMGRL> EDIT CONFIGURATION SET PROPERTY FastStartFailoverThreshold=60;

DGMGRL> EDIT DATABASE 'site1_edrsr8p1' SET PROPERTY 'LogXptMode'='SYNC';

Edit configuration (protection mode)

EDIT CONFIGURATION SET PROTECTION MODE AS <MAXPROTECTION | MAXAVAILABILITY | MAXPERFORMANCE>;

DGMGRL> EDIT CONFIGURATION SET PROTECTION MODE AS MAXAVAILABILITY;

Edit database property

EDIT DATABASE <database-name> SET PROPERTY=<value>;

DGMGRL> EDIT DATABASE prodb SET PROPERTY='ArchiveLagTarget'=900;

Edit database rename

EDIT DATABASE <database-name> RENAME TO =<new-database-name>;

DGMGRL> DISABLE DATABASE prodb;
DGMGRL> EDIT DATABASE prodb RENAME TO='prodc';
DGMGRL> ENABLE DATABASE prodc;

Edit database state

EDIT DATABASE <database-name> SET STATE=<database state>
[WITH APPLY INSTANCE=<instance-name>];

Database State

APPLY-OFF (physical or logical standby database only)

APPLY-ON (physical or logical standby database only)

TRANSPORT-OFF (primary database only)

TRANSPORT-ON (primary database only)

DGMGRL> EDIT DATABASE prodb SET STATE=apply-off;

Edit Instance

EDIT INSTANCE <instance-name> [ON DATABASE <database-name>]
SET auto pfile=<initialization-file | off>;

DGMGRL> EDIT INSTANCE 'prodb' ON DATABASE 'prodb' SET AUTO PFILE='initprodb.ora';

DGMGRL> EDIT INSTANCE 'prodb' ON DATABASE 'prodb'
SET PROPERTY 'StandbyArchiveLocation'='/u03/prodb/arch/';

Enable Configuration

ENABLE CONFIGURATION;

DGMGRL> ENABLE CONFIGURATION;

Enable Database

ENABLE DATABASE <database-name>;

DGMGRL> ENABLE DATABASE prodb;

Enable Fast_Start Failover

ENABLE FAST_START FAILOVER;

DGMGRL> ENABLE FAST_START FAILOVER;

Enable Fast_Start Failover Condition

ENABLE FAST_START FAILOVER CONDITION <value>;

Fast_Start Failover Conditions
Corrupted Controlfile	Corrupted controlfile
Corrupted Dictionary	Dictionary corruption of a critical database object
Datafile Offline	Data file offline due to a write error
Inaccessible Logfile	LGWR is unable to write to any member of a log group due to an I/O error
Stuck Archiver	Archiver is unable to archive a redo log because device is full or unavailable

DGMGRL> ENABLE FAST_START FAILOVER CONDITION "Corrupted Dictionary";

Exit the Data Guard Manager program

EXIT;

DGMGRL> EXIT;

Change a standby database to be the primary database

FAILOVER TO <database-name> [IMMEDIATE];

DGMGRL>FAILOVER TO prodb IMMEDIATE;

Display description and syntax for a command

HELP;

DGMGRL> HELP;

DGMGRL> HELP SHOW;

Exit the program

QUIT;

DGMGRL> QUIT;

Change a disabled database into a viable standby database

REINSTATE DATABASE <database_name>;

DGMGRL>REINSTATE DATABASE prodb;

Comment (remark) to be ignored by DGMGRL

rem <remark_string>;

DGMGRL>REM 'this is a comment';

Removes config info, including all DB profiles and terminates management of all of associated databases

REMOVE CONFIGURATION [PRESERVE DESTINATIONS];

DGMGRL> SHOW CONFIGURATION;
DGMGRL> REMOVE CONFIGURATION PRESERVE DESTINATIONS;
DGMGRL> SHOW CONFIGURATION;

Removes the specified standby database's profile from the broker configuration and terminates broker management of the standby database

REMOVE DATABASE <database_name> [PRESERVE DESTINATIONS];

DGMGRL> REMOVE DATABASE proda PRESERVE DESTINATIONS;

Removes the specified standby database's profile from the broker configuration and terminates broker management of the standby database

REMOVE INSTANCE <instance-name> [ON DATABASE <database-name>];

DGMGRL> SHOW CONFIGURATION;
DGMGRL> REMOVE INSTANCE proda ON DATABASE proda;
DGMGRL> SHOW CONFIGURATION;

Display information about a configuration, database, and/or instance

SHOW CONFIGURATION [VERBOSE | <property_name>];

DGMGRL> SHOW CONFIGURATION VERBOSE;

Display information about a database

SHOW DATABASE [VERBOSE] <database-name> [<property-name>];

DGMGRL> SHOW DATABASE VERBOSE;

Display information about a fast_start failover configuration

SHOW FAST_START FAILOVER;

DGMGRL> SHOW FAST_START FAILOVER;

Display information about an instance

SHOW INSTANCE [VERBOSE] <instance-name> [ON DATABASE <database-name>];

DGMGRL> SHOW INSTANCE VERBOSE;

Shutdown a currently running Oracle instance

SHUTDOWN [<ABORT | IMMEDIATE | NORMAL>];

DGMGRL> SHUTDOWN IMMEDIATE;

Start Fast-Start Failover observer

START OBSERVER [FILE=observer_configuration_filename];

DGMGRL> START OBSERVER '/u01/oracle/admin/orabase/dgbroker/orabase_dgcfg';

Start an Oracle database instance

STARTUP [FORCE] [RESTRICT] [PFILE=<filename>]
[<MOUNT | NOMOUNT | OPEN <READ ONLY | READ WRITE>>];

DGMGRL> STARTUP;

Stop the Fast_Start Failover Observer

STOP OBSERVER;

DGMGRL> STOP OBSERVER;

Switch roles between the primary database and a standby database

SWITCHOVER TO <database-name>;

DGMGRL> SWITCHOVER TO prodc;

Grid Infrastructure Data Guard

Role Swap On Physical Standby

srvctl modify database -d <physStandby> -s open -r primary

srvctl modify database -d TSTC -s open -r primary

Role Swap On Primary

srvctl modify database -d <Primary> -s mount -r physical_standby

srvctl modify database -d TSTA -s open -r physical_standby

12cR1 New Features

Far Sync

A Data Guard Far Sync instance is a remote Data Guard destination that accepts redo from the primary database and then ships that redo to other members of the Data Guard configuration.

Manages a control file, receives redo into standby redo logs (SRLs), and archives those SRLs to local archived redo logs.

A Far Sync instance does not have user data files, cannot be opened for access, cannot run redo apply, and can never function in the primary role or be converted to any type of standby database.

Far Sync instances are part of the Active Data Guard Far Sync feature, which requires an Active Data Guard license.

TBD

FastSync

LOG_ARCHIVE_DEST_n attributes SYNC and NOAFFIRM can be used together enabling a synchronous standby to be deployed at a futher distance from the primary site.

TBD

Operating System Installation

As root: Create DBA Group and Oracle User (both servers)

$ /usr/sbin/groupadd oinstall
$ /usr/sbin/groupadd dba
$ /usr/sbin/useradd -m -g oinstall -G dba oracle
$id oracle

$passwd oracle
-- set the password to oracle1

$ id nobody
-- if nobody does not exist then
$ /usr/sbin/useradd nobody

As root: Create Installation Directories (both servers)

mkdir -p /app/oracle
chown -R oracle:dba /app/oracle
chmod -R 775 /app/oracle

mkdir /stage
chown -R oracle:dba /stage

As root: Alter Kernel Parameters in sysctl.conf by appending the highlighted text

(both servers)

cd /etc

vi sysctl.conf

fs.file-max = 65536
kernel.shmall = 2097152
kernel.shmmax = 2147483648
kernel.shmmni = 4096
kernel.sem = 250 32000 100 128
# send receive buffers
net.core.rmem_default = 4194304
net.core.rmem_max = 4194304
net.core.wmem_default = 8388608
net.core.wmem_max = 8388608
# tcp read and write buffers
net.ipv4.tcp_rmem=4096 262144 8388608
net.ipv4.tcp_wmem=4096 262144 8388608

net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.ip_local_port_range = 1024 65000

-- activate changes
/sbin/sysctl -p

As root: Set Shell Limits

(both servers)

cd /etc/security

-- modify limits.conf

cat >> /etc/security/limits.conf <<EOF
oracle soft nproc 2047
oracle hard nproc 16384
oracle soft nofile 1024
oracle hard nofile 65536
EOF

-- modify /etc/pam.d/ login
-- read the file and place above the last lines as indicated

session required /lib/security/pam_limits.so
session required pam_limits.so

As root: Change Default Profile

(both servers)

cd /etc

-- if Bourne, Korn or Bash shell

cat >> /etc/profile <<EOF
if [ \$USER = "oracle" ]; then
if [ \$SHELL = "/bin/ksh" ]; then
    ulimit -p 16384
    ulimit -n 65536
else
    ulimit -u 16384 -n 65536
fi
umask 022
fi
EOF

As root: Append servers to hosts file

(both servers)

cd /etc

Note: the line: 127.0.0.1 omega#.mlib.org omega# localhost.localadmin localhost must remain. If you delete it ... nothing will work properly

vi hosts

192.168.1.101	omega1	omega1.mlib.org
192.168.1.102	omega2	omega2.mlib.org
10.0.1.1	dgomega1	dgomega1.mlib.org
10.0.1.2	dgomega2	dgomega2.mlib.org
192.168.10.01	omega1st	omega1st.mlib.org
192.168.10.02	omega2st	omega2st.mlib.org
192.168.10.100	netapp	netapp.mlib.org
192.168.1.119	bigdog	bigdog.mlib.org

-- ping all hosts (substitute actual node names for the "omega")
ping 192.168.1.1
ping www.oracle.com
ping omega1
ping omega2
ping omega1.mlib.org
ping omega2.mlib.org

Modify .bashrc (both servers)

cd /home/oracle

vi .bashrc

umask 022
ORACLE_BASE=/app/oracle/product
ORACLE_HOME=/app/oracle/product/11.2.0/dbhome_1
# ORACLE_SID=proda
# ORACLE_SID=prodb
PATH=$ORACLE_HOME/bin:$PATH
export PATH ORACLE_BASE ORACLE_HOME ORACLE_SID

-- sid name on omega1 is proda
-- sid name on omega2 is prodb

initproda.ora

proda.__db_cache_size=687865856
proda.__java_pool_size=16777216
proda.__large_pool_size=16777216
proda.__oracle_base='/app/oracle/product'
proda.__pga_aggregate_target=671088640
proda.__sga_target=989855744
proda.__shared_io_pool_size=0
proda.__shared_pool_size=251658240
proda.__streams_pool_size=0
*.audit_file_dest='/app/oracle/product/admin/proda/adump'
*.audit_trail='db'
*.compatible='11.1.0.0.0'
*.control_files= '/app/oracle/product/oradata/proda/control01.ctl', '/app/oracle/product/oradata/proda/control02.ctl',
'/app/oracle/product/oradata/proda/control03.ctl'
*.db_block_size=8192
*.db_domain=''
*.db_name='proda'
*.db_recovery_file_dest='/app/oracle/product/fast_recovery_area'
*.db_recovery_file_dest_size=2147483648
*.diagnostic_dest='/app/oracle/product'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=prodaXDB)'
*.memory_target=1658847232
*.open_cursors=300
*.processes=150
*.remote_login_passwordfile='EXCLUSIVE'
*.undo_tablespace='UNDOTBS1'
ifile=?/dbs/proda_dg.ora

initprodb.ora

prodb.__db_cache_size=687865856
prodb.__java_pool_size=16777216
prodb.__large_pool_size=16777216
prodb.__oracle_base='/app/oracle/product'
prodb.__pga_aggregate_target=671088640
prodb.__sga_target=989855744
prodb.__shared_io_pool_size=0
prodb.__shared_pool_size=251658240
prodb.__streams_pool_size=0
*.audit_file_dest='/app/oracle/product/admin/proda/adump'
*.audit_trail='db'
*.compatible='11.1.0.0.0'
*.control_files='/app/oracle/product/oradata/proda/control01.ctl', '/app/oracle/product/oradata/proda/control02.ctl', '/app/oracle/product/oradata/proda/control03.ctl'
*.db_block_size=8192
*.db_domain=''
*.db_name='proda'
*.db_recovery_file_dest='/app/oracle/product/fast_recovery_area'
*.db_recovery_file_dest_size=2147483648
*.diagnostic_dest='/app/oracle/product'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=prodbXDB)'
*.memory_target=1658847232
*.open_cursors=300
*.processes=150
*.remote_login_passwordfile='EXCLUSIVE'
*.undo_tablespace='UNDOTBS1'
ifile=?/dbs/prodb_dg.ora

On omega1 create a password file

$ cd $ORACLE_HOME/dbs

$ ls -la

orapwd file=orapwproda password=oracle1 ignorecase=y

set linesize 121
col name format a30
col value format a50

SELECT name, value
FROM gv$parameter
WHERE name LIKE '%remote_login%';

On omega2 create a password file

$ cd $ORACLE_HOME/dbs

$ ls -la

orapwd file=orapwprodb password=oracle1 ignorecase=y

set linesize 121
col name format a30
col value format a50

SELECT name, value
FROM gv$parameter
WHERE name LIKE '%remote_login%';

omega1: listener.ora

-- Note: SDU needs to be a multiple of MTU

# /sbin/ifconfig
# listener.ora Network Configuration File: /app/oracle/product/11.2.0/dbhome_1/network/admin/listener.ora
# Generated by Oracle configuration tools.

DG_LISTENER =
(DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.#.1)(PORT = 1526))
     (SEND_BUF_SIZE=9375000)
      (RECV_BUF_SIZE=9375000))
    )
)

SID_LIST_DG_LISTENER =
(SID_LIST =
    (SID_DESC =
      (SDU = 32767)
      (GLOBAL_DBNAME = proda)
      (ORACLE_HOME = /app/oracle/product/11.2.0/dbhome_1)
      (SID_NAME = proda)
    )
)

LISTENER =
(DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = TCP)(HOST = omega1.mlib.org)(PORT = 1521))
    )
)

SID_LIST_LISTENER =
(SID_LIST =
    (SID_DESC =
      (SID_NAME = PLSExtProc)
      (ORACLE_HOME = /app/oracle/product/11.2.0/dbhome_1)
      (PROGRAM = extproc)
   )
)

omega2: listener.ora

# listener.ora Network Configuration File: /app/oracle/product/11.2.0/dbhome_1/network/admin/listener.ora
# Generated by Oracle configuration tools.

DG_LISTENER =
(DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.#.2)(PORT = 1526)
    (SEND_BUF_SIZE=9375000)
    (RECV_BUF_SIZE=9375000))
)

SID_LIST_LISTENER =
(SID_LIST =
    (SID_DESC =
      (SID_NAME = PLSExtProc)
      (ORACLE_HOME = /app/oracle/product/11.2.0/dbhome_1)
      (PROGRAM = extproc)
    )
)

SID_LIST_DG_LISTENER =
(SID_LIST =
    (SID_DESC =
      (SDU=32767)
      (GLOBAL_DBNAME = prodb)
      (ORACLE_HOME = /app/oracle/product/11.1.0/dbhome_1)
      (SID_NAME = prodb)
    )
)

LISTENER =
(DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = TCP)(HOST = omega2.mlib.org)(PORT = 1521))
    )
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC0))
    )
)

both servers: sqlnet.ora

NAMES.DIRECTORY_PATH= (TNSNAMES, EZCONNECT)
DEFAULT_SDU_SIZE=32767

both servers: tnsnames.ora

# tnsnames.ora Network Configuration File: /app/oracle/product/11.2.0/dbhome_1/network/admin/tnsnames.ora
# Generated by Oracle configuration tools.

DGLOGSHIPB =
(DESCRIPTION =
    (SDU = 32767)
    (SEND_BUF_SIZE=9375000)
    (RECV_BUF_SIZE=9375000)
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.#.2)(PORT = 1526))
    )
   (CONNECT_DATA =
      (SERVICE_NAME = prodb)
    )
)

DGLOGSHIPA =
(DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.#.1)(PORT = 1526))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = proda)
    )
)

PRODB =
(DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = omega2.mlib.org)(PORT =1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = prodb)
    )
)

PRODA =
(DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = omega1.mlib.org)(PORT =1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = proda)
    )
)

REPOS =
(DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.119)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = repos)
    )
)

Physical Data Guard

dg_proda.ora

Note: Set net_timeout between 10-30 sec.

remote_login_passwordfile='EXCLUSIVE'

# set max_processes equal to max_connections + 1 for local
log_archive_max_processes=4

log_archive_config='DG_CONFIG=(proda,prodb)'
db_unique_name='proda'
# db_name='proda'
# service_names='proda'

fal_client=proda
fal_server=prodb

standby_file_management='auto'

log_file_name_convert=' ',' '

log_archive_dest_1='LOCATION=/app/oracle/product/fast_recovery_area/arch
valid_for=(all_logfiles,all_roles) db_unique_name=proda'

log_archive_dest_2='service=dglogshipb valid_for=(online_logfile,primary_role) db_unique_name=prodb NET_TIMEOUT=15 REOPEN=60 OPTIONAL MAX_CONNECTIONS=3'

log_archive_dest_state_1='ENABLE'
log_archive_dest_state_2='ENABLE'

# define archive log naming format
log_archive_format=%t+%s+%r.arc

dg_prodb.ora

remote_login_passwordfile='EXCLUSIVE'
# set max_processes equal to max_connections + 1 for local
log_archive_max_processes=4

log_archive_config='DG_CONFIG=(prodb,proda)'
db_unique_name='prodb'
# db_name='proda'
# service_names='proda'

fal_client=prodb
fal_server=proda

standby_file_management='auto'

log_file_name_convert=' ',' '

log_archive_dest_1= 'LOCATION=/app/oracle/product/fast_recovery_area/arch
valid_for=(all_logfiles,all_roles) db_unique_name=prodb'

log_archive_dest_2='service=dglogshipa valid_for=(online_logfile,primary_role) db_unique_name=proda
NET_TIMEOUT=15 REOPEN=60 OPTIONAL MAX_CONNECTIONS=3'

log_archive_dest_state_1='ENABLE'
log_archive_dest_state_2='ENABLE'

# define archive log naming format
log_archive_format=%t+%s+%r.arc

Prepare omega1 for backup

SQL> startup mount;
SQL> show parameter %archive%

SQL> SELECT inst_id, log_mode FROM v$database;

SQL> alter database archivelog;
SQL> alter database force logging;
SQL> alter database add supplemental log data;
SQL> alter database open;
SQL> archive log list;
SQL> alter system switch logfile;
SQL> archive log list;
SQL> SELECT dest_id, valid_type, valid_role, valid_now
FROM gv$archive_dest;

Create RMAN catalog and backup omega1

-- as oracle on bigdog
$ sqlplus / as sysdba

-- using OMF
CREATE TABLESPACE cat_tbs;

CREATE USER repoomega#
IDENTIFIED BY repoomega#
DEFAULT TABLESPACE cat_tbs
TEMPORARY TABLESPACE temp
QUOTA UNLIMITED ON cat_tbs;

GRANT create session, recovery_catalog_owner TO repoomega#;
GRANT execute ON dbms_stats TO repoomega#; -- for class only

Duplicate database for Standby on omega2

SQL> conn / as sysdba

SQL> startup nomount;

$ rman target sys/oracle1@proda auxiliary /

RMAN> duplicate target database for standby nofilenamecheck;

RMAN> exit;

On both servers create a password file

$ cd $ORACLE_HOME/dbs

$ ls -la

orapwd file=orapwprodb password=oracle1

set linesize 121
col name format a30
col value format a50

SELECT name, value
FROM gv$parameter
WHERE name LIKE '%remote_login%';

Step 3: Configure Standby Redo Logs

Minimally, the configuration should have one more standby redo log file group than the number of online redo log file groups on the primary database. However, the recommended number of standby redo log file groups is dependent on the number of threads on the primary database. Use the following equation to determine an appropriate number of standby redo log file groups:

(max. number of logfiles for each thread + 1) * max. number of threads

Using this equation reduces the likelihood that the primary instance's log writer (LGWR) process will be blocked because a standby redo log file cannot be allocated on the standby database. For example, if the primary database has 2 log files for each thread and 2 threads, then 6 standby redo log file groups are needed on the standby database.

By default Oracle has 1 thread with three log files so: (3+1) * 1 = 4

Physical Standby Commands

Complete applying all available redo in preparation for a failover. Use the FINISH clause only in the event of primary database failure.

ALTER DATABASE ACTIVATE PHYSICAL STANDBY DATABASE [FINISH];

ALTER DATABASE ACTIVATE PHYSICAL STANDBY DATABASE FINISH;

Create standby logfile

ALTER DATABASE ADD STANDBY LOGFILE '<path_and_file_name>' SIZE <integer> <M | G>;

ALTER DATABASE ADD STANDBY LOGFILE '/app/oracle/product/dbs/stdbylog01a.log' SIZE 50G;

Switchover of a primary database to standby

ALTER DATABASE <PREPARE | COMMIT> TO SWITCHOVER TO PHYSICAL STANDBY [WITH SESSION SHUTDOWN];

SELECT switchover_status FROM v$database;

ALTER DATABASE COMMIT TO SWITCHOVER TO PHYSICAL STANDBY WITH SESSION SHUTDOWN;

-- on the "new" primary if it the first time
ALTER SYSTEM SET DB_CREATE_FILE_DEST = '/u01/oradata/orabase';
ALTER TABLESPACE temp ADD TEMPFILE;

Convert a primary to a standby

ALTER DATABASE CONVERT TO <PHYSICAL | SNAPSHOT> STANDBY;

ALTER DATABASE CONVERT TO PHYSICAL STANDBY;

Start redo apply as a foreground process

ALTER DATABASE RECOVER MANAGED STANDBY DATABASE;

Stop redo apply

ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;

Start redo apply as a background process

ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT [FROM SESSION];

ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT;

Cancel an apply time delay

ALTER DATABASE RECOVER MANAGED STANDBY DATABASE NODELAY;

Start real-time apply as a foreground process

ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE;

Start real-time apply as a background process

ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE [DISCONNECT FROM SESSION];

ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT FROM SESSION;

From the standby manually register log files from a failed primary

ALTER DATABASE REGISTER [OR REPLACE] [PHYSICAL] LOGFILE <file_path_and_name>
[FOR <logminer_session_name>];

ALTER DATABASE REGISTER LOGFILE '/u01/dest1/stdbylog01a.log';

Change the protection mode

ALTER DATABASE SET STANDBY DATABASE TO <AVAILABILITY | PERFORMANCE | PROCTECTION>;

SHUTDOWN IMMEDIATE;
STARTUP MOUNT;
ALTER DATABASE SET STANDBY DATABASE TO MAXIMIZE PERFORMANCE;
ALTER DATABASE OPEN;

Switchover of a primary database to standby

ALTER DATABASE <PREPARE | COMMIT> TO SWITCHOVER TO ....;

SELECT switchover_status FROM V$database;

ALTER DATABASE COMMIT TO SWITCHOVER TO PHYSICAL STANDBY WITH SESSION SHUTDOWN;

At failover send unsent redo to the standby

ALTER SYSTEM FLUSH REDO SQL

ALTER SYSTEM FLUSH REDO SQL;

Active Data Guard

Description

Active Data Guard isn't a type of standby. It is the option that allows you to open a physical standby read only and restart the apply. It also includes the right to turn on block change tracking. In 11.2 it also includes automatically repairing corrupt blocks. This option requires that your purchase an additional license.

Steps to convert a physical standby into an Active Data Guard standby

-- stop redo application
alter database recover managed standby database cancel;

-- if RAC shut down all but one instance

-- the database should now be mounted but not open. Open the database in read-only mode
alter database open read only;

-- restart managed recovery
alter database recover managed standby database using current logfile disconnect;

-- verify that a fast recovery area has been configured.

Snapshot Standby

Description

A snapshot standby database is a fully updatable standby database that is created by converting a physical standby database into a snapshot standby database. A snapshot standby database receives and archives, but does not apply, redo data from a primary database. Redo data received from the primary database is applied when a snapshot standby database is converted back into a physical standby database, after discarding all local updates to the snapshot standby database.

A snapshot standby database typically diverges from its primary database over time because redo data from the primary database is not applied as it is received. Local updates to the snapshot standby database will cause additional divergence. The data in the primary database is fully protected however, because a snapshot standby can be converted back into a physical standby database at any time, and the redo data received from the primary will then be applied.

A snapshot standby database provides disaster recovery and data protection benefits that are similar to those of a physical standby database. Snapshot standby databases are best used in scenarios where the benefit of having a temporary, updatable snapshot of the primary database justifies increased time to recover from primary database failures.

Snapshot standby does not require any extra licensing as it is part of Data Guard in the EE license.

Steps to convert a physical standby into a Snapshot standby

-- on the physical standby stop the recovery process
alter database recover managed standby database cancel;

-- if RAC shut down all but one instance the database should now be mounted but not open. Open the database in read-only mode
alter database open read only;

-- restart managed recovery
alter database recover managed standby database disconnect;

-- verify that a fast recovery area has been configured

SQL> ALTER DATABASE CONVERT TO SNAPSHOT STANDBY;

Convert a snapshot to physical standby

SQL> ALTER DATABASE CONVERT TO PHYSICAL STANDBY;

Related SQL Statements

Examine Database Characteristics

SELECT inst_id, open_resetlogs, open_mode, protection_mode, protection_level
FROM v$database;

SELECT inst_id, remote_archive, activation#, switchover#, database_role
FROM v$database;

SELECT inst_id, archivelog_compression, switchover_status, dataguard_broker, guard_status
FROM v$database;

SELECT inst_id, supplemental_log_data_min, supplemental_log_data_pk, supplemental_log_data_ui, supplemental_log_data_fk, supplemental_log_data_all, force_logging
FROM v$database;

col name format a20
col db_unique_name format a20

SELECT inst_id, current_scn, db_unique_name, name, standby_became_primary_scn
FROM v$database;

col fs_failover_observer_host format a20

SELECT fs_failover_status, fs_failover_current_target, fs_failover_observer_present, fs_failover_observer_host
FROM v$database;

Data Guard Monitoring

col hostname format a30

SELECT db_name, hostname, log_archived, log_applied, applied_time,
       log_archived-log_applied AS GAP
FROM (
SELECT name AS DB_NAME
FROM v$database),
     (
SELECT UPPER(SUBSTR(host_name, 1, (DECODE(INSTR(host_name,'.'), 0,
LENGTH(host_name), (INSTR(host_name,'.')-1))))) HOSTNAME
FROM v$instance),
     (
SELECT MAX(sequence#) AS LOG_ARCHIVED
FROM v$archived_log
WHERE dest_id = 1
AND archived = 'YES'),
     (
SELECT MAX(sequence#) AS LOG_APPLIED
FROM v$archived_log
WHERE dest_id = 2
AND applied = 'YES'),
     (
SELECT TO_CHAR(MAX(completion_time),'DD-MON/HH24:MI') AS APPLIED_TIME
FROM v$archived_log
WHERE dest_id = 2
AND applied = 'YES');

Switchover Demo

SELECT name, current_scn, guard_status, switchover_status, database_role, checkpoint_change#, archive_change#, archivelog_change#, db_unique_name, platform_name
FROM v$database;

ALTER SESSION SET NLS_DATE_FORMAT='DD-MON-YY HH24:MI:SS';

SELECT sequence#, first_change#, next_change#, standby_dest, completion_time, is_recovery_dest_file, fal
FROM v$archived_log;

ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT;

ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY WITH SESSION SHUTDOWN WAIT;

STARTUP;

ALTER SYSTEM SET JOB_QUEUE_PROCESSES=0;
ALTER SYSTEM SET AQ_TM_PROCESSES=0;

ALTER DATABASE COMMIT TO SWITCHOVER TO STANDBY WITH SESSION SHUTDOWN WAIT;

SELECT SWITCHOVER_STATUS FROM GV$DATABASE;

ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY WITH SESSION SHUTDOWN WAIT;

Diagnose Logical Data Guard Failures

-- determine replication lag
SELECT name, value, unit
FROM v$dataguard_stats;

-- determine the schema was maintained by SQL Apply
SELECT owner
FROM dba_logstdby_skip
WHERE statement_opt = 'INTERNAL SCHEMA'
ORDER BY 1;

-- check the standby site for for error messages
col error format a30
col destination format a30

SELECT dest_id, status, destination, error
FROM v$archive_dest
ORDER BY 1;

Command

	Effect
ADD DATABASE	Adds a new standby database profile to the existing broker configuration DGMGRL> ADD DATABASE ‘secondary’ AS CONNECT IDENTIFIER IS secondary MAINTAINED AS PHYSICAL;
CONNECT	Connects to the specified database using the specified username. DGMGRL> connect sys/manager@primary Connected.
CREATE CONFIGURATION	Creates a broker configuration and creates and adds a primary database profile to the configuration. DGMGRL> CREATE CONFIGURATION ‘sample’ AS PRIMARY DATABASE IS ‘primary’ CONNECT IDENTIFIER IS primary; Configuration “sample” created with primary database “primary”
DISABLE CONFIGURATION	Disables broker management of a configuration so that the configuration and all of its databases are no longer managed by the broker. DGMGRL> disable configuration; Disabled.
DISABLE DATABASE	Disables broker management of the named standby database. DGMGRL> disable database secondary; Disabled.
EDIT CONFIGURATION (Protection Mode)	Changes the current protection mode setting for the broker configuration DGMGRL> EDIT CONFIGURATION SET PROTECTION MODE AS MAXPerformance; Succeeded. DGMGRL> EDIT CONFIGURATION SET PROTECTION MODE AS MAXPROTECTION; Operation requires shutdown of instance “primary” on database “primary” Shutting down instance “primary”… Database closed. Database dismounted. ORACLE instance shut down. Operation requires startup of instance “primary” on database “primary” Starting instance “primary”…
EDIT DATABASE (Property)	Changes the value of a property for the named database DGMGRL> EDIT DATABASE ‘primary’ SET PROPERTY ‘ArchiveLagTarget’=200; Property “ArchiveLagTarget” updated DGMGRL> EDIT DATABASE ‘primary’ SET PROPERTY ‘StandbyArchiveLocation’='C:\app\pa tilsa\product\11.2.0\dbhome_1\oradata’; Property “StandbyArchiveLocation” updated
EDIT DATABASE (Rename)	Changes the name used by the broker to refer to the specified database. This command can only be done when broker management of the database that you are renaming is disabled. DGMGRL> EDIT DATABASE ‘secondary’ RENAME TO ‘standby’; Error: ORA-16602: database must be disabled to perform this operation Failed. DGMGRL> disable database secondary; Disabled. DGMGRL> EDIT DATABASE ‘secondary’ RENAME TO ‘standby’; Succeeded.
EDIT DATABASE (State)	Changes the state of the specified database.The state in which you want the database to be running. The possible states are: ONLINE LOG-TRANSPORT-OFF (primary database only) LOG-APPLY-OFF (standby database only) READ-ONLY (physical standby database only) OFFLINE DGMGRL> EDIT DATABASE ‘secondary’ SET STATE=‘READ-ONLY’; Succeeded. DGMGRL> EDIT DATABASE ‘secondary’ SET STATE=‘OFFLINE’; Operation requires shutdown of instance “secondary” on database “secondary” Shutting down instance “secondary”… Database closed. Database dismounted. ORACLE instance shut down. DGMGRL> EDIT DATABASE ‘secondary’ SET STATE=’ONLINE’; Error: ORA-16525: the Data Guard broker is not yet available – Make sure Standby Database is started and mounted. DGMGRL> EDIT DATABASE ‘secondary’ SET STATE=’ONLINE’; Succeeded.
EDIT INSTANCE (AUTO PFILE)	Sets the name of the initialization parameter file for the specified instance. DGMGRL> EDIT INSTANCE ‘secondary’ ON DATABASE ‘secondary’ SET AUTO PFILE=’initse condary.ora’; Instance “secondary” updated
EDIT INSTANCE (Property)	Changes the value of a property for the specified instance. DGMGRL> EDIT instance ‘primary’ on DATABASE ‘primary’ SET PROPERTY ‘StandbyArchi veLocation’='C:\app\patilsa\product\11.2.0\dbhome_1\oradata’; Property “StandbyArchiveLocation” updated
ENABLE CONFIGURATION	Enables broker management of the broker configuration and all of its databases.DGMGRL> enable configuration; Enabled.
ENABLE DATABASE	Enables broker management of the specified database. DGMGRL> enable database ‘secondary’; Enabled.
EXIT	Exits the Data Guard command-line interface.
FAILOVER	Performs a database failover operation in which the standby database, to which the CLI is currently connected, fails over to the role of primary database.
HELP	Displays online help for the Data Guard command-line interface. add Adds a standby database to the broker configuration connect Connects to an Oracle database instance convert Converts a database from one type to another create Creates a broker configuration disable Disables a configuration, a database, or fast-start failover edit Edits a configuration, database, or instance enable Enables a configuration, a database, or fast-start failover exit Exits the program failover Changes a standby database to be the primary database help Displays description and syntax for a command quit Exits the program reinstate Changes a database marked for reinstatement into a viable standby rem Comment to be ignored by DGMGRL remove Removes a configuration, database, or instance show Displays information about a configuration, database, or instance shutdown Shuts down a currently running Oracle database instance start Starts the fast-start failover observer startup Starts an Oracle database instance stop Stops the fast-start failover observer switchover Switches roles between a primary and standby database Use "help <command>" to see syntax for individual commands
QUIT	Quits the Data Guard command-line interface.
REMOVE CONFIGURATION	Removes the broker configuration including all of its database profiles from the broker configuration file.
REMOVE DATABASE	Removes the specified standby database profile from the broker configuration. DGMGRL> remove database secondary; Removed database “secondary” from the configuration
REMOVE INSTANCE	Removes knowledge of an instance from an existing database profile in the broker configuration.
SHOW CONFIGURATION	Displays information about the broker configuration. DGMGRL> SHOW CONFIGURATION Configuration – sample Protection Mode: MaxPerformance Databases: primary – Primary database Warning: ORA-16789: standby redo logs not configured secondary – Physical standby database Fast-Start Failover: DISABLED Configuration Status: WARNING
SHOW DATABASE	Displays information about the specified database. DGMGRL> show database secondary; Database – secondary Role: PRIMARY Intended State: TRANSPORT-ON Instance(s): secondary Database Status: SUCCESS
SHOW INSTANCE	Displays information about the specified instance. DGMGRL> show instance primary; Instance ‘primary’ of database ‘primary’ Host Name: sagar-pc Instance Status: SUCCESS DGMGRL> show instance secondary; Instance ‘secondary’ of database ‘secondary’ Host Name: sagar-pc Instance Status: SUCCESS
SHUTDOWN	Shuts down a currently running Oracle database. DGMGRL> connect sys/manager@primary Connected. DGMGRL> shutdown abort; ORACLE instance shut down.
STARTUP	Starts an Oracle instance with the same options as SQL*Plus, including mounting and opening a database. DGMGRL> connect sys/manager@primary Connected. DGMGRL> startup; ORACLE instance started. Database mounted. Database opened.
SWITCHOVER	Performs a switchover operation in which the current primary database becomes a standby database, and the specified standby database becomes the primary database. DGMGRL> switchover to secondary; Performing switchover NOW, please wait… New primary database “secondary” is opening… Operation requires shutdown of instance “primary” on database “primary” Shutting down instance “primary”… ORA-01109: database not open Database dismounted. ORACLE instance shut down. Operation requires startup of instance “primary” on database “primary” Starting instance “primary”… Unable to connect to database ORA-12514: TNS:listener does not currently know of service requested in connectdescriptor Failed. (IGNORE TNS error) Warning: You are no longer connected to ORACLE. Please complete the following steps to finish switchover: start up and mount instance “primary” of database “primary”

Monday, August 15, 2016

log file sync in RAC 11GR1

Problem
Huge Log file sync waits during 3 hour run of ETL job (Extract, transform, Load) on one of the nodes of 2 Node RAC cluster.

DBtimeMonitor - Product by Dominic Giles

Environment
2 Node RAC
Block Size: 16K
ClusterWare Version: 11.1.0.7
RDBMS and ASM Version: 11.1.0.7.9
OS : AIX 6.1
Storage Vendor : Oceanstor s2900
Number of LMS processes: 2 per node, running in Real Time priority as expected.

Background
Log file sync wait for Single Instance DB has been explained in depth in many blogs. I will take a moment to extend its definition for RAC and summarize the sequence of events involved in log file sync wait:
1.) User issues commit/rollback(ends with a commit) and wakes up log writer (LGWR). The wait time for log file sync starts here.
2.) LGWR issues/queues write call to OS to write to disk, refreshes the logical scn and asynchronously posts LMS to broadcast this SCN (BOC) from the local instance to other instances in cluster.
While I/O subsystem completes the write to disk, LGWR itself starts waiting on ‘Log File Parallel Write‘.
Wait counter for ‘wait for scn ack‘ also starts ticking now.
3.a) Once write is complete, OS posts back to LGWR. This is where wait counter for log file parallel write stops. For every slow write to disk (log write taking more than 500ms), a warning is written to LGWR tracefile “Warning: log write time 550ms, size 17KB“
3.b) By this time, if local node has not received the SCN acknowledgement for the BOC post sent to remote nodes in step 2, LGWR continues to wait on “wait for scn ack” event. If this wait for scn acknowledgement is more than 500ms, another warning get logged to the lgwr tracefile “Warning: log write broadcast wait time 670ms”
Note: In 11.1 RAC, For a simple commit statement, it is easy to see this otherwise invisible wait event by setting “_lgwr_async_broadcasts”=false and by enabling 10046 trace on LGWR.
11.2 has another interesting parameter “_lgwr_posts_for_pending_bcasts” which is set to False by default. I am yet to test its impact on log file sync waits.
4.) Once acknowledgement is received by LGWR from other instances , LGWR posts back to user that write is complete (commit is durable).
5.) Right after posting back to user session, log file sync wait counter stops.
Its worth noting that with all these steps involved in log file sync waits, CPU scheduling plays a crucial role in almost every step. It is possible that LGWR doesn’t get CPU cycle immediately and has to wait in run-queue. All this wait time would add to the user ‘log file sync’ wait time.

Analysis

Areas to analyze:
a.) Commit Frequency
b.) I/O response time
c.) CPU starvation
d.) Bugs relevant to BOC scheme.

Commit Frequency
Number of commits were high. Sections from AWR:

             Snap Id           Snap Time       Sessions Curs/Sess

             ---------     ------------------- -------- ---------

 Begin Snap: 10525         17-Feb-12 02:00:53     118    2.1

 End Snap:   10526         17-Feb-12 03:00:04     158    2.7

 Elapsed:       59.17 (mins)

 DB Time:    1,020.28 (mins)

Top <del>5</del> 2 Timed Foreground Events

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

                                                        Avg

                                                        wait   % DB

Event                            Waits         Time(s)  (ms)   time   Wait Class

------------------------------ ------------ ----------- ------ ------ ----------

log file sync                    12,689,306     36,776    3     60.1     Commit

DB CPU                                           5,164           8.4

Statistic                            Total            per Second     per Trans

-------------------------------- ------------------ -------------- -------------

redo size                            10,635,966,936    2,995,679.1         799.5

user I/O wait time                          608,253          171.3           0.1

user calls                               17,928,502        5,049.7           1.4

user commits                             12,801,929        3,605.7           1.0

user rollbacks                              501,433          141.2           0.0

I/O Response Time
LGWR tracefile showed some latency in write time:

*** 2012-02-16 02:53:21.691

Warning: log write time 550ms, size 17KB

Here is output from v$event_histogram for log file parallel write wait, for this db instance (started 3 days before the analysis):

EVENT#             EVENT           WAIT_TIME_MILLI WAIT_COUNT

---------- ------------------------ --------------- ----------

 126       log file parallel write      1            59580471

 126       log file parallel write      2             3022225

 126       log file parallel write      4             1797856

 126       log file parallel write      8              624168

 126       log file parallel write     16              169960

 126       log file parallel write     32               46138

 126       log file parallel write     64               18218

 126       log file parallel write    128                4825

 126       log file parallel write    256                1188

 126       log file parallel write    512                 353

 126       log file parallel write   1024                  39

 126       log file parallel write   2048                   2

Redo Logfiles were/are not multiplexed. However, stored on same RAID 5 disk array as Datafiles. A new REDO_DG was created using disks from a different disk array than datafiles (though this one was also RAID 5) and redo logfiles from both instances were moved to a new disk array. Result: Little improvement.

*** 2012-02-18 03:26:35.293

Warning: log write time 500ms, size 9KB

In the snapshot above, updisk26 represents the disk holding the redo logfiles. In this snapshot, Busy% is Zero. However, there were some snapshots taken with busy% around 40 and an occasional 90.
As disk is part of RAID5 array, Parity calculation(even though it happens in background for newer SAN controllers) for every small write to disk might be adding to problems.
CPU Usage
Below is one of the many snapshots taken during the ETL run. Tool used : NMON (AIX)
CPU usage peeked 100%. Reason: An application Module is also running on the DB Host, which is hogging CPU time.

LGWR is not getting enough CPU cycles ?

Bugs relevant to BOC scheme
MOS has a really educative note (ID: 259454.1) explaining different SCN generation Schemes i.e. Lamport SCN Generation and Broadcast on Commit SCN generation Scheme. It also discusses the reason to switch to BOC scheme as default SCN generating Scheme starting 10gR2.
During the ETL job, following messages appeared in lgwr trace, which to me needed more research as to why and how this can impact log file sync waits. Thats how I came up with the fresh write-up for the log file sync wait event for RAC.

*** 2012-02-17 02:17:33.868

Warning: log write broadcast wait time 670ms

*** 2012-02-17 22:30:54.316

Warning: log write broadcast wait time 3000ms

-- RAC Specific waits from AWR Report

Statistic                            Total            per Second     per Trans

-------------------------------- ------------------ -------------- -------------

broadcast on commit wait time(ms)      937,303         264.0          0.1

broadcast on commit waits            1,669,140         470.1          0.1

Reasons for these waits mostly point to a bug with symptoms of : huge wait time for ‘wait for scn ack’ or lgwr unable to handle multiple posts at the same time.
There are three bugs reported on top of 11.1.0.7 related to log file sync waits & BOC and all of them are worth looking at.
Bug 8490879 – Long “log file sync” latencies due to broadcast on commit scheme
Bug 7716356 – Long “log file sync” latencies with broadcast on commit scheme in RAC
Bug 7610362 – Long “log file sync” waits in RAC with broadcast on commit in RAC

Finally, Options to Consider
a.) Increase number of CPUs on the system. Then, if needed, bump up the priority of LGWR (renice) at OS level. If needed put LGWR in the list of “_high_priority_processes”. LNS and VKTM are already in the list on 11.1
b.) RAID5 is said to provide reasonable sequential write performance, but I am yet to test it. RAID 0 (Striping only) might prove to be a better bet for log files. It is also possible that this I/O subsystem is unable to perform as expected with writes which are smaller than the Stripe size defined for RAID.
c.) Move ETL module of application hosted on the DB Server to a different host.
d.) Reduce the number of commits.
e.) Raise an SR to get a confirmation if problem is due to any known bugs with 11.1.0.7 version.

ORACLE DATABASE ADMINISTRATOR

Search

Tuesday, August 16, 2016

Oracle Data Guard Version 11.2.0.3

Monday, August 15, 2016

log file sync in RAC 11GR1

Followers