Search

Tuesday, August 16, 2016

Oracle Data Guard Version 11.2.0.3

General Information

Data Guard is a technology built into Enterprise Edition. It is not a download or an add-on and with the excpetion of Active Data Guard fully included, as of this writing, in the basic EE license.
The following was posted by DG Product Manager Larry Carpenter to the OTN forums on 1-Jun-2010. A small amount of editing has been done for consistency.

Maximum Protection and Maximum Availability are Data Guard's Zero Data Loss protection modes. The only difference is that in Maximum Protection Data Guard will abort the Primary (not allowing a transaction to be reported as committed to the application) if the redo cannot be written to a SYNC standby. In Maximum Availability it is allowed for a transaction to be committed even if there are no SYNC standby destinations available. That would be the 1st failure, the standby is not reachable. So when the Primary failed after that (the 2nd failure) you would of course lose data, the data that was generated while the standby was not reachable before the Primary failed.

However, this is not your situation. What you are experiencing is a misunderstanding of how the databases work.

In Oracle Versions prior to 11.2 the SYNC redo transport works as follows.

1. User says COMMIT.
2. LGWR writes the redo to the Online Redo Log.
3. The LNS processes grabs the same redo from the Log Buffer AFTER the LGWR has written the redo to the Online redo log file, and sends it to their standby.
4. When the SYNC standbys all report back that the redo has been received and written to the standby redo log file OR they fail to respond within NET_TIMEOUT seconds is the client told that the transaction has in fact committed.

What you are experiencing is the failure is happening between step 2 and step 3. The redo is written to the online redo log file at the primary but the client has not been told that the transaction has actually committed.

When you failover and do the select count(*) from the Primary you have the exact count of transactions that the client was actually told had been committed. When you then restarted the Primary what does it do? Crash recovery! Which will recover the redo that was actually written to the online redo log file which is that 'last' row which was the current transaction but was never actually 'committed' to the application. Your did not actually lose that row as the application was never told that it was committed.

In a real situation you would have failed over and then when the Primary came back you would have reinstated it as a standby and not opened it.

Try your test where your test application prints out the counter AFTER the commit has finished and then pull the plug, failover and see if the count in the table on the standby matches the counter in the application.

Final result, you didn't lose any data that the application was told was committed.

Larry

P.S. As of 11.2 the write to the Online Redo Log file by the LGWR and the send to the SYNC standby happens in parallel so you will see even less of what you saw when you try your test in the same way in 11.2)
Acronyms
Acronym Short For
ARC0 Archiver Process on Local Server
ARCH Archiver
ASYNC Asynchronous
DMON Data Guard Monitor
DR Disaster Recovery
FAL Fetch Archive Log
LCR Logical Change Record
LGWR Logwriter
LNS Network Server Process
LSP Logical Standby Process (SQL Apply for Logical DG)
MRP Managed Recovery Process (Redo Apply for Physical DG)
RFS Remote File Server
SYNC Synchronous
TAF Transparent Application Failover
Data Dictionary Objects for Physical DataGuard
DBA_LOG_GROUPS DBA_HIST_LOG GV$MANAGED_STANDBY
DBA_LOG_GROUP_COLUMNS GV$DATAGUARD_CONFIG V$DATABASE
DBA_REGISTERED_ARCHIVED_LOG GV$DATAGUARD_STATUS V$STANDBY_EVENT_HISTOGRAM
Data Dictionary Objects for Logical DataGuard
DBA_LOGSTDBY_EVENTS DBA_LOGSTDBY_SKIP GV$DATAGUARD_CONFIG
DBA_LOGSTDBY_HISTORY DBA_LOGSTDBY_SKIP_TRANSACTION GV$DATAGUARD_STATS
DBA_LOGSTDBY_LOG DBA_LOGSTDBY_UNSUPPORTED GV$DATAGUARD_STATUS
DBA_LOGSTDBY_NOT_UNIQUE DBMS_DRS GV$LOGSTDBY
DBA_LOGSTDBY_PARAMETERS DBMS_LOGSTDBY GV$LOGSTDBY_STATS
DBA_LOGSTDBY_PROGRESS GV$ARCHIVE_GAP GV$STANDBY_LOG
Docs
Doc Link Data Type
Click Here 11g Data Guard Docs
Click Here 11g Data Guard - What's New
Click Here Data Guard Protection Modes
Click Here Switchover Role Transitions
Click Here Fast start Failovers
Click Here Data Guard Broker
 
Data Guard Broker
Start Data Guard Manager $ DGMGRL
Add a standby database to the broker configuration ADD DATABASE <database_name> [as connect identifier is <connect-identifier>];
DGMGRL> add database proda;
Connect to the database via the broker CONNECT username[@connect-identifier]
DGMGRL> connect sys/syspwd
Connected.
Error:
ORA-16525: the Data Guard broker is not yet available
ORA-06512: at SYS:DBMS_DRS", line 124
ORA-06512: at line 1

SQL> ALTER SYSTEM SET dg_broker_start = TRUE;
System altered.

DGMGRL> connect sys/syspwd
Connected.
DGMGRL> exit

SQL> alter system set DG_BROKER_START = FALSE;
Convert or revert a standby between physical and snapshot CONVERT <db_unique_name> TO <SNAPSHOT | PHYSICAL> STANDBY;
DGMGRL> CONVERT DATABASE prodb TO snapshot standby;
Create a broker configuration CREATE CONFIGURATION <configuration-name> AS
PRIMARY DATABASE IS <database-name>
CONNECT IDENTIFIER IS <connect-identifier>;
DGMGRL> CREATE CONFIGURATION 'dg_prod_cfg' AS
PRIMARY DATABASE IS
'proda'
CONNECT IDENTIFIER IS proda.mlib.com;
Disable a configuration DISABLE CONFIGURATION;
DGMGRL> DISABLE CONFIGURATION;
Disable a database DISABLE DATABASE <database-name>;
DGMGRL> DISABLE DATABASE prodb;
Disable Fast-Start Failover DISABLE FAST_START FAILOVER [FORCE];
DGMGRL> DISABLE FAST_START FAILOVER FORCE;
Disable Fast-Start Failover Condition DISABLE FAST_START FAILOVER CONDITION <value>;
DGMGRL> DISABLE FAST_START FAILOVER CONDITION "corrupted controlfile";
Edit a configuration property EDIT CONFIGURATION SET PROPERTY <property_name> = <value>;
DGMGRL> EDIT CONFIGURATION SET PROPERTY FastStartFailoverThreshold=60;

DGMGRL> EDIT DATABASE 'site1_edrsr8p1' SET PROPERTY 'LogXptMode'='SYNC';
Edit configuration (protection mode) EDIT CONFIGURATION SET PROTECTION MODE AS <MAXPROTECTION | MAXAVAILABILITY | MAXPERFORMANCE>;
DGMGRL> EDIT CONFIGURATION SET PROTECTION MODE AS MAXAVAILABILITY;
Edit database property EDIT DATABASE <database-name> SET PROPERTY=<value>;
DGMGRL> EDIT DATABASE prodb SET PROPERTY='ArchiveLagTarget'=900;
Edit database rename EDIT DATABASE <database-name> RENAME TO =<new-database-name>;
DGMGRL> DISABLE DATABASE prodb;
DGMGRL> EDIT DATABASE prodb RENAME TO='prodc';
DGMGRL> ENABLE DATABASE prodc;
Edit database state EDIT DATABASE <database-name> SET STATE=<database state>
[WITH APPLY INSTANCE=<instance-name>];

Database State
APPLY-OFF (physical or logical standby database only)
APPLY-ON (physical or logical standby database only)
TRANSPORT-OFF (primary database only)
TRANSPORT-ON (primary database only)
DGMGRL> EDIT DATABASE prodb SET STATE=apply-off;
Edit Instance EDIT INSTANCE <instance-name> [ON DATABASE <database-name>]
SET auto pfile=<initialization-file | off>;
DGMGRL> EDIT INSTANCE 'prodb' ON DATABASE 'prodb' SET AUTO PFILE='initprodb.ora';

DGMGRL> EDIT INSTANCE 'prodb' ON DATABASE 'prodb'
SET PROPERTY 'StandbyArchiveLocation'='/u03/prodb/arch/';
Enable Configuration ENABLE CONFIGURATION;
DGMGRL> ENABLE CONFIGURATION;
Enable Database ENABLE DATABASE <database-name>;
DGMGRL> ENABLE DATABASE prodb;
Enable Fast_Start Failover ENABLE FAST_START FAILOVER;
DGMGRL> ENABLE FAST_START FAILOVER;
Enable Fast_Start Failover Condition ENABLE FAST_START FAILOVER CONDITION <value>;

Fast_Start Failover Conditions
Corrupted Controlfile Corrupted controlfile
Corrupted Dictionary Dictionary corruption of a critical database object
Datafile Offline Data file offline due to a write error
Inaccessible Logfile LGWR is unable to write to any member of a log group due to an I/O error
Stuck Archiver Archiver is unable to archive a redo log because device is full or unavailable
DGMGRL> ENABLE FAST_START FAILOVER CONDITION "Corrupted Dictionary";
Exit the Data Guard Manager program EXIT;
DGMGRL> EXIT;
Change a standby database to be the primary database FAILOVER TO <database-name> [IMMEDIATE];
DGMGRL>FAILOVER TO prodb IMMEDIATE;
Display description and syntax for a command HELP;
DGMGRL> HELP;

DGMGRL> HELP SHOW;
Exit the program QUIT;
DGMGRL> QUIT;
Change a disabled database into a viable standby database REINSTATE DATABASE <database_name>;
DGMGRL>REINSTATE DATABASE prodb;
Comment (remark) to be ignored by DGMGRL rem <remark_string>;
DGMGRL>REM 'this is a comment';
Removes config info, including all DB profiles and terminates management of all of associated databases REMOVE CONFIGURATION [PRESERVE DESTINATIONS];
DGMGRL> SHOW CONFIGURATION;
DGMGRL> REMOVE CONFIGURATION PRESERVE DESTINATIONS;
DGMGRL> SHOW CONFIGURATION;
Removes the specified standby database's profile from the broker configuration and terminates broker management of the standby database REMOVE DATABASE <database_name> [PRESERVE DESTINATIONS];
DGMGRL> REMOVE DATABASE proda PRESERVE DESTINATIONS;
Removes the specified standby database's profile from the broker configuration and terminates broker management of the standby database REMOVE INSTANCE <instance-name> [ON DATABASE <database-name>];
DGMGRL> SHOW CONFIGURATION;
DGMGRL> REMOVE INSTANCE proda ON DATABASE proda;
DGMGRL> SHOW CONFIGURATION;
Display information about a configuration, database, and/or instance SHOW CONFIGURATION [VERBOSE | <property_name>];
DGMGRL> SHOW CONFIGURATION VERBOSE;
Display information about a database SHOW DATABASE [VERBOSE] <database-name> [<property-name>];
DGMGRL> SHOW DATABASE VERBOSE;
Display information about a fast_start failover configuration SHOW FAST_START FAILOVER;
DGMGRL> SHOW FAST_START FAILOVER;
Display information about an instance SHOW INSTANCE [VERBOSE] <instance-name> [ON DATABASE <database-name>];
DGMGRL> SHOW INSTANCE VERBOSE;
Shutdown a currently running Oracle instance SHUTDOWN [<ABORT | IMMEDIATE | NORMAL>];
DGMGRL> SHUTDOWN IMMEDIATE;
Start Fast-Start Failover observer START OBSERVER [FILE=observer_configuration_filename];
DGMGRL> START OBSERVER '/u01/oracle/admin/orabase/dgbroker/orabase_dgcfg';
Start an Oracle database instance STARTUP [FORCE] [RESTRICT] [PFILE=<filename>]
[<MOUNT | NOMOUNT | OPEN <READ ONLY | READ WRITE>>];
DGMGRL> STARTUP;
Stop the Fast_Start Failover Observer STOP OBSERVER;
DGMGRL> STOP OBSERVER;
Switch roles between the primary database and a standby database SWITCHOVER TO <database-name>;
DGMGRL> SWITCHOVER TO prodc;
 
Grid Infrastructure Data Guard
Role Swap On Physical Standby srvctl modify database -d <physStandby> -s open -r primary
srvctl modify database -d TSTC -s open -r primary
Role Swap On Primary srvctl modify database -d <Primary> -s mount -r physical_standby
srvctl modify database -d TSTA -s open -r physical_standby
 
12cR1 New Features
Far Sync A Data Guard Far Sync instance is a remote Data Guard destination that accepts redo from the primary database and then ships that redo to other members of the Data Guard configuration.

Manages a control file, receives redo into standby redo logs (SRLs), and archives those SRLs to local archived redo logs.

A Far Sync instance does not have user data files, cannot be opened for access, cannot run redo apply, and can never function in the primary role or be converted to any type of standby database.

Far Sync instances are part of the Active Data Guard Far Sync feature, which requires an Active Data Guard license.
TBD
FastSync LOG_ARCHIVE_DEST_n attributes SYNC and NOAFFIRM can be used together enabling a synchronous standby to be deployed at a futher distance from the primary site.
TBD
 
Operating System Installation
As root: Create DBA Group and Oracle User (both servers) $ /usr/sbin/groupadd oinstall
$ /usr/sbin/groupadd dba
$ /usr/sbin/useradd -m -g oinstall -G dba oracle
$id oracle

$passwd oracle
-- set the password to oracle1

$ id nobody
-- if nobody does not exist then
$ /usr/sbin/useradd nobody
As root: Create Installation Directories (both servers) mkdir -p /app/oracle
chown -R oracle:dba /app/oracle
chmod -R 775 /app/oracle

mkdir /stage
chown -R oracle:dba /stage
As root: Alter Kernel Parameters in sysctl.conf by appending the highlighted text

 (both servers)
cd /etc

vi sysctl.conf

fs.file-max = 65536
kernel.shmall = 2097152
kernel.shmmax = 2147483648
kernel.shmmni = 4096
kernel.sem = 250 32000 100 128
# send receive buffers
net.core.rmem_default = 4194304
net.core.rmem_max = 4194304
net.core.wmem_default = 8388608
net.core.wmem_max = 8388608
# tcp read and write buffers
net.ipv4.tcp_rmem=4096 262144 8388608
net.ipv4.tcp_wmem=4096 262144 8388608

net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.ip_local_port_range = 1024 65000

-- activate changes
/sbin/sysctl -p
As root: Set Shell Limits

 (both servers)
cd /etc/security

-- modify limits.conf
cat >> /etc/security/limits.conf <<EOF
oracle soft nproc 2047
oracle hard nproc 16384
oracle soft nofile 1024
oracle hard nofile 65536
EOF

-- modify /etc/pam.d/ login
-- read the file and place above the last lines as indicated

session required /lib/security/pam_limits.so
session required pam_limits.so
As root: Change Default Profile

 (both servers)
cd /etc

-- if Bourne, Korn or Bash shell
cat >> /etc/profile <<EOF
if [ \$USER = "oracle" ]; then
  if [ \$SHELL = "/bin/ksh" ]; then
    ulimit -p 16384
    ulimit -n 65536
  else
    ulimit -u 16384 -n 65536
  fi
  umask 022
fi
EOF
As root: Append servers to hosts file

 (both servers)
cd /etc

Note: the line: 127.0.0.1   omega#.mlib.org   omega#   localhost.localadmin localhost must remain. If you delete it ... nothing will work properly

vi hosts

192.168.1.101 omega1 omega1.mlib.org
192.168.1.102 omega2 omega2.mlib.org
10.0.1.1 dgomega1 dgomega1.mlib.org
10.0.1.2 dgomega2 dgomega2.mlib.org
192.168.10.01 omega1st omega1st.mlib.org
192.168.10.02 omega2st omega2st.mlib.org
192.168.10.100 netapp netapp.mlib.org
192.168.1.119 bigdog bigdog.mlib.org

-- ping all hosts (substitute actual node names for the "omega")
ping 192.168.1.1
ping www.oracle.com
ping omega1
ping omega2
ping omega1.mlib.org
ping omega2.mlib.org
Modify .bashrc (both servers) cd /home/oracle

vi .bashrc

umask 022
ORACLE_BASE=/app/oracle/product
ORACLE_HOME=/app/oracle/product/11.2.0/dbhome_1
# ORACLE_SID=proda
# ORACLE_SID=prodb
PATH=$ORACLE_HOME/bin:$PATH
export PATH ORACLE_BASE ORACLE_HOME ORACLE_SID

-- sid name on omega1 is proda
-- sid name on omega2 is prodb
initproda.ora proda.__db_cache_size=687865856
proda.__java_pool_size=16777216
proda.__large_pool_size=16777216
proda.__oracle_base='/app/oracle/product'
proda.__pga_aggregate_target=671088640
proda.__sga_target=989855744
proda.__shared_io_pool_size=0
proda.__shared_pool_size=251658240
proda.__streams_pool_size=0
*.audit_file_dest='/app/oracle/product/admin/proda/adump'
*.audit_trail='db'
*.compatible='11.1.0.0.0'
*.control_files= '/app/oracle/product/oradata/proda/control01.ctl', '/app/oracle/product/oradata/proda/control02.ctl',
'/app/oracle/product/oradata/proda/control03.ctl'
*.db_block_size=8192
*.db_domain=''
*.db_name='proda'
*.db_recovery_file_dest='/app/oracle/product/fast_recovery_area'
*.db_recovery_file_dest_size=2147483648
*.diagnostic_dest='/app/oracle/product'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=prodaXDB)'
*.memory_target=1658847232
*.open_cursors=300
*.processes=150
*.remote_login_passwordfile='EXCLUSIVE'
*.undo_tablespace='UNDOTBS1'
ifile=?/dbs/proda_dg.ora
initprodb.ora prodb.__db_cache_size=687865856
prodb.__java_pool_size=16777216
prodb.__large_pool_size=16777216
prodb.__oracle_base='/app/oracle/product'
prodb.__pga_aggregate_target=671088640
prodb.__sga_target=989855744
prodb.__shared_io_pool_size=0
prodb.__shared_pool_size=251658240
prodb.__streams_pool_size=0
*.audit_file_dest='/app/oracle/product/admin/proda/adump'
*.audit_trail='db'
*.compatible='11.1.0.0.0'
*.control_files='/app/oracle/product/oradata/proda/control01.ctl', '/app/oracle/product/oradata/proda/control02.ctl', '/app/oracle/product/oradata/proda/control03.ctl'
*.db_block_size=8192
*.db_domain=''
*.db_name='proda'
*.db_recovery_file_dest='/app/oracle/product/fast_recovery_area'
*.db_recovery_file_dest_size=2147483648
*.diagnostic_dest='/app/oracle/product'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=prodbXDB)'
*.memory_target=1658847232
*.open_cursors=300
*.processes=150
*.remote_login_passwordfile='EXCLUSIVE'
*.undo_tablespace='UNDOTBS1'
ifile=?/dbs/prodb_dg.ora
On omega1 create a password file $ cd $ORACLE_HOME/dbs

$ ls -la

orapwd file=orapwproda password=oracle1 ignorecase=y

set linesize 121
col name format a30
col value format a50

SELECT name, value
FROM gv$parameter
WHERE name LIKE '%remote_login%';
On omega2 create a password file $ cd $ORACLE_HOME/dbs

$ ls -la

orapwd file=orapwprodb password=oracle1 ignorecase=y

set linesize 121
col name format a30
col value format a50

SELECT name, value
FROM gv$parameter
WHERE name LIKE '%remote_login%';
omega1: listener.ora -- Note: SDU needs to be a multiple of MTU

# /sbin/ifconfig
# listener.ora Network Configuration File: /app/oracle/product/11.2.0/dbhome_1/network/admin/listener.ora
# Generated by Oracle configuration tools.

DG_LISTENER =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.#.1)(PORT = 1526))
      (SEND_BUF_SIZE=9375000)
      (RECV_BUF_SIZE=9375000))
    )
  )

SID_LIST_DG_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (SDU = 32767)
      (GLOBAL_DBNAME = proda)
      (ORACLE_HOME = /app/oracle/product/11.2.0/dbhome_1)
      (SID_NAME = proda)
    )
  )

LISTENER =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = TCP)(HOST = omega1.mlib.org)(PORT = 1521))
    )
  )

SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (SID_NAME = PLSExtProc)
      (ORACLE_HOME = /app/oracle/product/11.2.0/dbhome_1)
      (PROGRAM = extproc)
   )
  )
omega2: listener.ora # listener.ora Network Configuration File: /app/oracle/product/11.2.0/dbhome_1/network/admin/listener.ora
# Generated by Oracle configuration tools.

DG_LISTENER =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.#.2)(PORT = 1526)
    (SEND_BUF_SIZE=9375000)
    (RECV_BUF_SIZE=9375000))
  )

SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (SID_NAME = PLSExtProc)
      (ORACLE_HOME = /app/oracle/product/11.2.0/dbhome_1)
      (PROGRAM = extproc)
    )
  )

SID_LIST_DG_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (SDU=32767)
      (GLOBAL_DBNAME = prodb)
      (ORACLE_HOME = /app/oracle/product/11.1.0/dbhome_1)
      (SID_NAME = prodb)
    )
  )

LISTENER =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = TCP)(HOST = omega2.mlib.org)(PORT = 1521))
    )
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC0))
    )
  )
both servers: sqlnet.ora NAMES.DIRECTORY_PATH= (TNSNAMES, EZCONNECT)
DEFAULT_SDU_SIZE=32767
both servers: tnsnames.ora # tnsnames.ora Network Configuration File: /app/oracle/product/11.2.0/dbhome_1/network/admin/tnsnames.ora
# Generated by Oracle configuration tools.


DGLOGSHIPB =
  (DESCRIPTION =
    (SDU = 32767)
    (SEND_BUF_SIZE=9375000)
    (RECV_BUF_SIZE=9375000)
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.#.2)(PORT = 1526))
    )
   (CONNECT_DATA =
      (SERVICE_NAME = prodb)
    )
  )

DGLOGSHIPA =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.#.1)(PORT = 1526))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = proda)
    )
  )

PRODB =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = omega2.mlib.org)(PORT =1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = prodb)
    )
  )

PRODA =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = omega1.mlib.org)(PORT =1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = proda)
    )
  )

REPOS =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.119)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = repos)
    )
  )
 
Physical Data Guard
dg_proda.ora

Note: Set net_timeout between 10-30 sec.
remote_login_passwordfile='EXCLUSIVE'

# set max_processes equal to max_connections + 1 for local
log_archive_max_processes=4

log_archive_config='DG_CONFIG=(proda,prodb)'
db_unique_name='proda'
# db_name='proda'
# service_names='proda'

fal_client=proda
fal_server=prodb

standby_file_management='auto'

log_file_name_convert=' ',' '

log_archive_dest_1='LOCATION=/app/oracle/product/fast_recovery_area/arch
valid_for=(all_logfiles,all_roles) db_unique_name=proda'

log_archive_dest_2='service=dglogshipb valid_for=(online_logfile,primary_role) db_unique_name=prodb NET_TIMEOUT=15 REOPEN=60 OPTIONAL MAX_CONNECTIONS=3'

log_archive_dest_state_1='ENABLE'
log_archive_dest_state_2='ENABLE'

# define archive log naming format
log_archive_format=%t+%s+%r.arc
dg_prodb.ora remote_login_passwordfile='EXCLUSIVE'
# set max_processes equal to max_connections + 1 for local
log_archive_max_processes=4

log_archive_config='DG_CONFIG=(prodb,proda)'
db_unique_name='prodb'
# db_name='proda'
# service_names='proda'

fal_client=prodb
fal_server=proda

standby_file_management='auto'

log_file_name_convert=' ',' '

log_archive_dest_1= 'LOCATION=/app/oracle/product/fast_recovery_area/arch
valid_for=(all_logfiles,all_roles) db_unique_name=prodb'

log_archive_dest_2='service=dglogshipa valid_for=(online_logfile,primary_role) db_unique_name=proda
NET_TIMEOUT=15 REOPEN=60 OPTIONAL MAX_CONNECTIONS=3'

log_archive_dest_state_1='ENABLE'
log_archive_dest_state_2='ENABLE'

# define archive log naming format
log_archive_format=%t+%s+%r.arc
Prepare omega1 for backup SQL> startup mount;
SQL> show parameter %archive%

SQL> SELECT inst_id, log_mode FROM v$database;

SQL> alter database archivelog;
SQL> alter database force logging;
SQL> alter database add supplemental log data;
SQL> alter database open;
SQL> archive log list;
SQL> alter system switch logfile;
SQL> archive log list;
SQL> SELECT dest_id, valid_type, valid_role, valid_now
     FROM gv$archive_dest;
Create RMAN catalog and backup omega1 -- as oracle on bigdog
$ sqlplus / as sysdba

-- using OMF
CREATE TABLESPACE cat_tbs;

CREATE USER repoomega#
IDENTIFIED BY repoomega#
DEFAULT TABLESPACE cat_tbs
TEMPORARY TABLESPACE temp
QUOTA UNLIMITED ON cat_tbs;

GRANT create session, recovery_catalog_owner TO repoomega#;
GRANT execute ON dbms_stats TO repoomega#-- for class only
Duplicate database for Standby on omega2 SQL> conn / as sysdba

SQL> startup nomount;

$ rman target sys/oracle1@proda auxiliary /

RMAN> duplicate target database for standby nofilenamecheck;

RMAN> exit;
On both servers create a password file $ cd $ORACLE_HOME/dbs

$ ls -la

orapwd file=orapwprodb password=oracle1

set linesize 121
col name format a30
col value format a50

SELECT name, value
FROM gv$parameter
WHERE name LIKE '%remote_login%';
Step 3: Configure Standby Redo Logs Minimally, the configuration should have one more standby redo log file group than the number of online redo log file groups on the primary database. However, the recommended number of standby redo log file groups is dependent on the number of threads on the primary database. Use the following equation to determine an appropriate number of standby redo log file groups:

(max. number of logfiles for each thread + 1) * max. number of threads

Using this equation reduces the likelihood that the primary instance's log writer (LGWR) process will be blocked because a standby redo log file cannot be allocated on the standby database. For example, if the primary database has 2 log files for each thread and 2 threads, then 6 standby redo log file groups are needed on the standby database.

By default Oracle has 1 thread with three log files so: (3+1) * 1 = 4
 
Physical Standby Commands
Complete applying all available redo in preparation for a failover. Use the FINISH clause only in the event of primary database failure. ALTER DATABASE ACTIVATE PHYSICAL STANDBY DATABASE [FINISH];
ALTER DATABASE ACTIVATE PHYSICAL STANDBY DATABASE FINISH;
Create standby logfile ALTER DATABASE ADD STANDBY LOGFILE '<path_and_file_name>' SIZE <integer> <M | G>;
ALTER DATABASE ADD STANDBY LOGFILE '/app/oracle/product/dbs/stdbylog01a.log' SIZE 50G;
Switchover of a primary database to standby ALTER DATABASE <PREPARE | COMMIT> TO SWITCHOVER TO PHYSICAL STANDBY [WITH SESSION SHUTDOWN];
SELECT switchover_status FROM v$database;

ALTER DATABASE COMMIT TO SWITCHOVER TO PHYSICAL STANDBY WITH SESSION SHUTDOWN;

-- on the "new" primary if it the first time
ALTER SYSTEM SET DB_CREATE_FILE_DEST = '/u01/oradata/orabase';
ALTER TABLESPACE temp ADD TEMPFILE;
Convert a primary to a standby ALTER DATABASE CONVERT TO <PHYSICAL | SNAPSHOT> STANDBY;
ALTER DATABASE CONVERT TO PHYSICAL STANDBY;
Start redo apply as a foreground process ALTER DATABASE RECOVER MANAGED STANDBY DATABASE;
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE;
Stop redo apply ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
Start redo apply as a background process ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT [FROM SESSION];
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT;
Cancel an apply time delay ALTER DATABASE RECOVER MANAGED STANDBY DATABASE NODELAY;
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE NODELAY;
Start real-time apply as a foreground process ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE;
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE;
Start real-time apply as a background process ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE [DISCONNECT FROM SESSION];
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT FROM SESSION;
From the standby manually register log files from a failed primary ALTER DATABASE REGISTER [OR REPLACE] [PHYSICAL] LOGFILE <file_path_and_name>
[FOR <logminer_session_name>];
ALTER DATABASE REGISTER LOGFILE '/u01/dest1/stdbylog01a.log';
Change the protection mode ALTER DATABASE SET STANDBY DATABASE TO <AVAILABILITY | PERFORMANCE | PROCTECTION>;
SHUTDOWN IMMEDIATE;
STARTUP MOUNT;
ALTER DATABASE SET STANDBY DATABASE TO MAXIMIZE PERFORMANCE;
ALTER DATABASE OPEN;
Switchover of a primary database to standby ALTER DATABASE <PREPARE | COMMIT> TO SWITCHOVER TO ....;
SELECT switchover_status FROM V$database;

ALTER DATABASE COMMIT TO SWITCHOVER TO PHYSICAL STANDBY WITH SESSION SHUTDOWN;
At failover send unsent redo to the standby ALTER SYSTEM FLUSH REDO SQL
ALTER SYSTEM FLUSH REDO SQL;
 
Active Data Guard
Description Active Data Guard isn't a type of standby. It is the option that allows you to open a physical standby read only and restart the apply. It also includes the right to turn on block change tracking. In 11.2 it also includes automatically repairing corrupt blocks. This option requires that your purchase an additional license.
Steps to convert a physical standby into an Active Data Guard standby -- stop redo application
alter database recover managed standby database cancel;

-- if RAC shut down all but one instance

-- the database should now be mounted but not open. Open the database in read-only mode

alter database open read only;

-- restart managed recovery
alter database recover managed standby database using current logfile disconnect;

-- verify that a fast recovery area has been configured.
 
Snapshot Standby
Description A snapshot standby database is a fully updatable standby database that is created by converting a physical standby database into a snapshot standby database. A snapshot standby database receives and archives, but does not apply, redo data from a primary database. Redo data received from the primary database is applied when a snapshot standby database is converted back into a physical standby database, after discarding all local updates to the snapshot standby database.

A snapshot standby database typically diverges from its primary database over time because redo data from the primary database is not applied as it is received. Local updates to the snapshot standby database will cause additional divergence. The data in the primary database is fully protected however, because a snapshot standby can be converted back into a physical standby database at any time, and the redo data received from the primary will then be applied.

A snapshot standby database provides disaster recovery and data protection benefits that are similar to those of a physical standby database. Snapshot standby databases are best used in scenarios where the benefit of having a temporary, updatable snapshot of the primary database justifies increased time to recover from primary database failures.

Snapshot standby does not require any extra licensing as it is part of Data Guard in the EE license.
Steps to convert a physical standby into a Snapshot standby -- on the physical standby stop the recovery process
alter database recover managed standby database cancel;

-- if RAC shut down all but one instance the database should now be mounted but not open. Open the database in read-only mode
alter database open read only;

-- restart managed recovery
alter database recover managed standby database disconnect;

-- verify that a fast recovery area has been configured

SQL> ALTER DATABASE CONVERT TO SNAPSHOT STANDBY;
Convert a snapshot to physical standby SQL> ALTER DATABASE CONVERT TO PHYSICAL STANDBY;
 
Related SQL Statements
Examine Database Characteristics SELECT inst_id, open_resetlogs, open_mode, protection_mode, protection_level
FROM v$database;

SELECT inst_id, remote_archive, activation#, switchover#, database_role
FROM v$database;

SELECT inst_id, archivelog_compression, switchover_status, dataguard_broker, guard_status
FROM v$database;

SELECT inst_id, supplemental_log_data_min, supplemental_log_data_pk, supplemental_log_data_ui, supplemental_log_data_fk, supplemental_log_data_all, force_logging
FROM v$database;

col name format a20
col db_unique_name format a20

SELECT inst_id, current_scn, db_unique_name, name, standby_became_primary_scn
FROM v$database;

col fs_failover_observer_host format a20

SELECT fs_failover_status, fs_failover_current_target, fs_failover_observer_present, fs_failover_observer_host
FROM v$database;
Data Guard Monitoring col hostname format a30

SELECT db_name, hostname, log_archived, log_applied, applied_time,
       log_archived-log_applied AS GAP
FROM (
  SELECT name AS DB_NAME
  FROM v$database),
     (
  SELECT UPPER(SUBSTR(host_name, 1, (DECODE(INSTR(host_name,'.'), 0,
  LENGTH(host_name), (INSTR(host_name,'.')-1))))) HOSTNAME
  FROM v$instance),
     (
  SELECT MAX(sequence#) AS LOG_ARCHIVED
  FROM v$archived_log
  WHERE dest_id = 1
  AND archived = 'YES'),
     (
  SELECT MAX(sequence#) AS LOG_APPLIED
  FROM v$archived_log
  WHERE dest_id = 2
  AND applied = 'YES'),
     (
  SELECT TO_CHAR(MAX(completion_time),'DD-MON/HH24:MI') AS APPLIED_TIME
  FROM v$archived_log
  WHERE dest_id = 2
  AND applied = 'YES');
Switchover Demo SELECT name, current_scn, guard_status, switchover_status, database_role, checkpoint_change#, archive_change#, archivelog_change#, db_unique_name, platform_name
FROM v$database;

ALTER SESSION SET NLS_DATE_FORMAT='DD-MON-YY HH24:MI:SS';

SELECT sequence#, first_change#, next_change#, standby_dest, completion_time, is_recovery_dest_file, fal
FROM v$archived_log;

ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT;

ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY WITH SESSION SHUTDOWN WAIT;

STARTUP;

ALTER SYSTEM SET JOB_QUEUE_PROCESSES=0;
ALTER SYSTEM SET AQ_TM_PROCESSES=0;

ALTER DATABASE COMMIT TO SWITCHOVER TO STANDBY WITH SESSION SHUTDOWN WAIT;

SELECT SWITCHOVER_STATUS FROM GV$DATABASE;

ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY WITH SESSION SHUTDOWN WAIT;
Diagnose Logical Data Guard Failures -- determine replication lag
SELECT name, value, unit
FROM v$dataguard_stats;

-- determine the schema was maintained by SQL Apply
SELECT owner
FROM dba_logstdby_skip
WHERE statement_opt = 'INTERNAL SCHEMA'
ORDER BY 1;

-- check the standby site for for error messages
col error format a30
col destination format a30

SELECT dest_id, status, destination, error
FROM v$archive_dest
ORDER BY 1;

    Command

Effect
ADD DATABASE Adds a new standby database profile to the existing broker configuration 
DGMGRL> ADD DATABASE ‘secondary’ AS CONNECT IDENTIFIER IS secondary MAINTAINED AS PHYSICAL;
CONNECT Connects to the specified database using the specified username. 
DGMGRL> connect sys/manager@primary
Connected.
CREATE CONFIGURATION Creates a broker configuration and creates and adds a primary database profile to the configuration. 
DGMGRL> CREATE CONFIGURATION ‘sample’ AS PRIMARY DATABASE IS ‘primary’ CONNECT IDENTIFIER IS primary;
Configuration “sample” created with primary database “primary”
DISABLE CONFIGURATION Disables broker management of a configuration so that the configuration and all of its databases are no longer managed by the broker. 
DGMGRL> disable configuration;
Disabled.
DISABLE DATABASE Disables broker management of the named standby database. 
DGMGRL> disable database secondary;
Disabled.
EDIT CONFIGURATION (Protection Mode) Changes the current protection mode setting for the broker configuration 
DGMGRL> EDIT CONFIGURATION SET PROTECTION MODE AS  MAXPerformance;
Succeeded.
DGMGRL> EDIT CONFIGURATION SET PROTECTION MODE AS  MAXPROTECTION;
Operation requires shutdown of instance “primary” on database “primary”
Shutting down instance “primary”…
Database closed.
Database dismounted.
ORACLE instance shut down.
Operation requires startup of instance “primary” on database “primary”
Starting instance “primary”…
EDIT DATABASE (Property) Changes the value of a property for the named database 
DGMGRL> EDIT DATABASE ‘primary’ SET PROPERTY ‘ArchiveLagTarget’=200;
Property “ArchiveLagTarget” updated
DGMGRL> EDIT DATABASE ‘primary’ SET PROPERTY ‘StandbyArchiveLocation’='C:\app\pa
tilsa\product\11.2.0\dbhome_1\oradata’;
Property “StandbyArchiveLocation” updated
EDIT DATABASE (Rename) Changes the name used by the broker to refer to the specified database. This command can only be done when broker management of the database that you are renaming is disabled. 
DGMGRL> EDIT DATABASE ‘secondary’ RENAME TO ‘standby’;
Error: ORA-16602: database must be disabled to perform this operation
Failed.
DGMGRL> disable database secondary;
Disabled.
DGMGRL> EDIT DATABASE ‘secondary’ RENAME TO ‘standby’;
Succeeded.
EDIT DATABASE (State) Changes the state of the specified database.The state in which you want the database to be running. The possible states are:
ONLINE
LOG-TRANSPORT-OFF (primary database only)
LOG-APPLY-OFF (standby database only)
READ-ONLY (physical standby database only)
OFFLINE 
DGMGRL> EDIT DATABASE ‘secondary’ SET STATE=‘READ-ONLY’;
Succeeded.
DGMGRL> EDIT DATABASE ‘secondary’ SET STATE=‘OFFLINE’;
Operation requires shutdown of instance “secondary” on database “secondary”
Shutting down instance “secondary”…
Database closed.
Database dismounted.
ORACLE instance shut down.
DGMGRL> EDIT DATABASE ‘secondary’ SET STATE=’ONLINE’;
Error: ORA-16525: the Data Guard broker is not yet available
– Make sure Standby Database is started and mounted.
DGMGRL> EDIT DATABASE ‘secondary’ SET STATE=’ONLINE’;
Succeeded.
EDIT INSTANCE (AUTO PFILE) Sets the name of the initialization parameter file for the specified instance. 
DGMGRL> EDIT INSTANCE ‘secondary’ ON DATABASE ‘secondary’ SET AUTO PFILE=’initse
condary.ora’;
Instance “secondary” updated
EDIT INSTANCE (Property) Changes the value of a property for the specified instance. 
DGMGRL> EDIT instance ‘primary’ on DATABASE ‘primary’ SET PROPERTY ‘StandbyArchi
veLocation’='C:\app\patilsa\product\11.2.0\dbhome_1\oradata’;
Property “StandbyArchiveLocation” updated
ENABLE CONFIGURATION Enables broker management of the broker configuration and all of its databases.DGMGRL> enable configuration;
Enabled.
ENABLE DATABASE Enables broker management of the specified database. 
DGMGRL> enable database ‘secondary’;
Enabled.
EXIT Exits the Data Guard command-line interface.
FAILOVER Performs a database failover operation in which the standby database, to which the CLI is currently connected, fails over to the role of primary database.
HELP Displays online help for the Data Guard command-line interface. 
add            Adds a standby database to the broker configuration
connect        Connects to an Oracle database instance
convert        Converts a database from one type to another
create         Creates a broker configuration
disable        Disables a configuration, a database, or fast-start failover
edit           Edits a configuration, database, or instance
enable         Enables a configuration, a database, or fast-start failover
exit           Exits the program
failover       Changes a standby database to be the primary database
help           Displays description and syntax for a command
quit           Exits the program
reinstate      Changes a database marked for reinstatement into a viable standby
rem            Comment to be ignored by DGMGRL
remove         Removes a configuration, database, or instance
show           Displays information about a configuration, database, or instance
shutdown       Shuts down a currently running Oracle database instance
start          Starts the fast-start failover observer
startup        Starts an Oracle database instance
stop           Stops the fast-start failover observer
switchover     Switches roles between a primary and standby database
Use "help <command>" to see syntax for individual commands
QUIT Quits the Data Guard command-line interface.
REMOVE CONFIGURATION Removes the broker configuration including all of its database profiles from the broker configuration file.
REMOVE DATABASE Removes the specified standby database profile from the broker configuration. 
DGMGRL> remove database secondary;
Removed database “secondary” from the configuration
REMOVE INSTANCE Removes knowledge of an instance from an existing database profile in the broker configuration.
SHOW CONFIGURATION Displays information about the broker configuration. 
DGMGRL> SHOW CONFIGURATION
Configuration – sample
Protection Mode: MaxPerformance
Databases:
primary   – Primary database
Warning: ORA-16789: standby redo logs not configured
secondary – Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
WARNING
SHOW DATABASE Displays information about the specified database. 
DGMGRL> show database secondary;
Database – secondary
Role:            PRIMARY
Intended State:  TRANSPORT-ON
Instance(s):
secondary
Database Status:
SUCCESS
SHOW INSTANCE Displays information about the specified instance. 
DGMGRL> show instance primary;
Instance ‘primary’ of database ‘primary’
Host Name: sagar-pc
Instance Status:
SUCCESS
DGMGRL> show instance secondary;
Instance ‘secondary’ of database ‘secondary’
Host Name: sagar-pc
Instance Status:
SUCCESS
SHUTDOWN Shuts down a currently running Oracle database. 
DGMGRL> connect sys/manager@primary
Connected.
DGMGRL> shutdown abort;
ORACLE instance shut down.
STARTUP Starts an Oracle instance with the same options as SQL*Plus, including mounting and opening a database. 
DGMGRL> connect sys/manager@primary
Connected.
DGMGRL> startup;
ORACLE instance started.
Database mounted.
Database opened.
SWITCHOVER Performs a switchover operation in which the current primary database becomes a standby database, and the specified standby database becomes the primary database. 
DGMGRL> switchover to secondary;
Performing switchover NOW, please wait…
New primary database “secondary” is opening…
Operation requires shutdown of instance “primary” on database “primary”
Shutting down instance “primary”…
ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.
Operation requires startup of instance “primary” on database “primary”
Starting instance “primary”…
Unable to connect to database
ORA-12514: TNS:listener does not currently know of service requested in connectdescriptor Failed. (IGNORE TNS error)
Warning: You are no longer connected to ORACLE.
Please complete the following steps to finish switchover:
start up and mount instance “primary” of database “primary”
 
              

Monday, August 15, 2016

log file sync in RAC 11GR1


Problem
Huge Log file sync waits during 3 hour run of ETL job (Extract, transform, Load) on one of the nodes of  2 Node RAC cluster.
Environment
2 Node RAC
Block Size: 16K
ClusterWare Version: 11.1.0.7
RDBMS and ASM Version: 11.1.0.7.9
OS : AIX 6.1
Storage Vendor : Oceanstor s2900
Number of LMS processes: 2 per node, running in Real Time priority as expected.

Background
Log file sync wait for Single Instance DB has been explained in depth in many blogs. I will take a moment to extend its definition for RAC and summarize the sequence of events involved in log file sync wait:
1.) User issues commit/rollback(ends with a commit) and wakes up log writer (LGWR). The wait time for log file sync starts here.
2.) LGWR issues/queues write call to OS to write to disk, refreshes the logical scn and asynchronously posts LMS to broadcast this SCN (BOC) from the local instance to other instances in cluster.
While I/O subsystem completes the write to disk, LGWR itself starts waiting on ‘Log File Parallel Write‘.
Wait counter for ‘wait for scn ack‘ also starts ticking now.
3.a) Once write is complete, OS posts back to LGWR. This is where wait counter for log file parallel write stops. For every slow write to disk (log write taking more than 500ms), a warning is written to LGWR tracefile “Warning: log write time 550ms, size 17KB 
3.b) By this time,  if local node has not received the SCN acknowledgement for the BOC post sent to remote nodes in step 2, LGWR continues to wait on “wait for scn ack” event. If this wait for scn acknowledgement is more than 500ms, another warning get logged to the lgwr tracefile “Warning: log write broadcast wait time 670ms”
Note: In 11.1 RAC, For a simple commit statement, it is easy to see this otherwise invisible wait event by setting “_lgwr_async_broadcasts”=false and by enabling 10046 trace on LGWR.
11.2 has another interesting parameter “_lgwr_posts_for_pending_bcasts”  which is set to False by default. I am yet to test its impact on log file sync waits.
4.) Once acknowledgement is received by LGWR from other instances , LGWR posts back to user that write is complete (commit is durable).
5.) Right after posting back to user session, log file sync wait counter stops.
Its worth noting that with all these steps involved in log file sync waits, CPU scheduling plays a crucial role in almost every step.  It is possible that LGWR doesn’t get CPU cycle immediately and has to wait in run-queue.  All this wait time would add to the user ‘log file sync’ wait time.

Analysis
Areas to analyze:
a.) Commit Frequency
b.) I/O response time
c.) CPU starvation
d.) Bugs relevant to BOC scheme.

Commit Frequency
Number of commits were high. Sections from AWR:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
             Snap Id           Snap Time       Sessions Curs/Sess
             ---------     ------------------- -------- ---------
 Begin Snap: 10525         17-Feb-12 02:00:53     118    2.1
 End Snap:   10526         17-Feb-12 03:00:04     158    2.7
 Elapsed:       59.17 (mins)
 DB Time:    1,020.28 (mins)
 
Top <del>5</del> 2 Timed Foreground Events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                        Avg
                                                        wait   % DB
Event                            Waits         Time(s)  (ms)   time   Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
log file sync                    12,689,306     36,776    3     60.1     Commit
DB CPU                                           5,164           8.4
 
Statistic                            Total            per Second     per Trans
-------------------------------- ------------------ -------------- -------------
redo size                            10,635,966,936    2,995,679.1         799.5
user I/O wait time                          608,253          171.3           0.1
user calls                               17,928,502        5,049.7           1.4
user commits                             12,801,929        3,605.7           1.0
user rollbacks                              501,433          141.2           0.0
I/O Response Time
LGWR tracefile showed some latency in write time:
1
2
*** 2012-02-16 02:53:21.691
Warning: log write time 550ms, size 17KB
Here is output from v$event_histogram for log file parallel write wait, for this db instance (started 3 days before the analysis):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
EVENT#             EVENT           WAIT_TIME_MILLI WAIT_COUNT
---------- ------------------------ --------------- ----------
 126       log file parallel write      1            59580471
 126       log file parallel write      2             3022225
 126       log file parallel write      4             1797856
 126       log file parallel write      8              624168
 126       log file parallel write     16              169960
 126       log file parallel write     32               46138
 126       log file parallel write     64               18218
 126       log file parallel write    128                4825
 126       log file parallel write    256                1188
 126       log file parallel write    512                 353
 126       log file parallel write   1024                  39
 126       log file parallel write   2048                   2
Redo Logfiles were/are not multiplexed. However, stored on same RAID 5 disk array as Datafiles. A new REDO_DG was created using disks from a different disk array than datafiles (though this one was also RAID 5) and redo logfiles from both instances were moved to a new disk array. Result: Little improvement.
1
2
*** 2012-02-18 03:26:35.293
Warning: log write time 500ms, size 9KB

In the snapshot above, updisk26 represents the disk holding the redo logfiles. In this snapshot, Busy% is Zero. However, there were some snapshots taken with busy% around 40 and an occasional 90.
As disk is part of RAID5 array, Parity calculation(even though it happens in background for newer SAN controllers) for every small write to disk might be adding to problems.
CPU Usage
Below is one of the many snapshots taken during the ETL run.  Tool used : NMON (AIX)
CPU usage peeked 100%. Reason:  An application Module is also running on the DB Host, which is hogging CPU time.

LGWR is not getting enough CPU cycles ?

Bugs relevant to BOC scheme
MOS has a really educative note (ID: 259454.1) explaining different SCN generation Schemes i.e. Lamport SCN Generation and Broadcast on Commit SCN generation Scheme. It also discusses the reason to switch to BOC scheme as default SCN generating Scheme starting 10gR2.
During the ETL job, following messages appeared in lgwr trace, which to me needed more research as to why and how this can impact log file sync waits.  Thats how I came up with the fresh write-up for the log file sync wait event for RAC.
1
2
3
4
5
6
7
8
9
*** 2012-02-17 02:17:33.868
Warning: log write broadcast wait time 670ms
*** 2012-02-17 22:30:54.316
Warning: log write broadcast wait time 3000ms
-- RAC Specific waits from AWR Report
Statistic                            Total            per Second     per Trans
-------------------------------- ------------------ -------------- -------------
broadcast on commit wait time(ms)      937,303         264.0          0.1
broadcast on commit waits            1,669,140         470.1          0.1
Reasons for these waits mostly point to a bug with symptoms of : huge wait time for ‘wait for scn ack’   or lgwr unable to handle multiple posts at the same time.
There are three bugs reported on top of 11.1.0.7 related to log file sync waits & BOC and all of them are worth looking at.
Bug 8490879 – Long “log file sync” latencies due to broadcast on commit scheme
Bug 7716356 – Long “log file sync” latencies with broadcast on commit scheme in RAC
Bug 7610362 – Long “log file sync” waits in RAC with broadcast on commit in RAC

Finally, Options to Consider
a.) Increase number of CPUs on the system.  Then, if needed, bump up the priority of LGWR (renice) at OS level.  If  needed put LGWR in the list of “_high_priority_processes”. LNS and VKTM are already in the list on 11.1
b.) RAID5 is said to provide reasonable sequential write performance, but I am yet to test it. RAID 0 (Striping only) might prove to be a better bet for log files. It is also possible that this I/O subsystem is unable to perform as expected with writes which are smaller than the Stripe size defined for RAID.
c.) Move ETL module of application hosted on the DB Server to a different host.
d.) Reduce the number of commits.
e.) Raise an SR to get a confirmation if problem is due to any known bugs with 11.1.0.7 version.