Search

Sunday, June 19, 2016

Oracle : buffer busy wait

Buffer Busy Waits usually happen on Oracle 10 and 11 mainly because of insert contention into tables or Indexes.  There are a few other rare cases of contention on old style RBS segments, file headers blocks and freelists.
Before Oracle 10 and 11 there was one other major reason which was readers waiting for readers, ie one user does a phyiscal IO of a block into memory and a second user want to read that block. The second user waits until the IO is finished by the first user. Starting in 10g this wait has been given the name “read by other session“.  Before Oracle 10g this was also a “buffer busy wait”.
The easiest way to analyse the bottleneck and find a solution is to use ASH (active session History) available in Oracle 10g with the diagnostics pack license or using Simulated ASH for free or using a product like DB Optimizer.
Data block class, which can be found in ASH,  is the most important piece of information in analysing buffer busy waits. If we know the block class we can determine what kind of bottleneck:
 If CLASS=
    1. data block
      • IF OTYPE =
      • INDEX , then the insert index leaf block is probably hot, solutions are
        • Hash partition the index
        • Use reverse key index
      • TABLE, then insert block is hot,solutions
        • Use free lists
        • Put Object in ASSM tablespace
    2. Segment header – If “segment header” occurs at the same time as CLASS= “data block” on the same object and the object is of OTYPE= “TABLE”  then this is just a confirmation that the TABLE needs to use free lists or  ASSM.
    3. File Header Block – Most likely extent allocation problems, look at extent size on tablespace and increase the extent size to there are few extent allocations and less contention on the File Header Block.
    4. free lists – Add free list groups to the object
    5. undo header – Not enough UNDO segments, if using old RBS then switch to AUM
    6. undo block – Hot spot in UNDO, application issue
How do we find the block class? With a quick query on the ASH data like:
select
       o.object_name obj,
       o.object_type otype,
       ash.SQL_ID,
       w.class
from v$active_session_history ash,
     ( select rownum class#, class from v$waitstat ) w,
      all_objects o
where event='buffer busy waits'
   and w.class#(+)=ash.p3
   and o.object_id (+)= ash.CURRENT_OBJ#
Order by sample_time;
For Example

OBJ    OTYPE  SQL_ID        CLASS
------ ------ ------------- ------------------
TOTO1  TABLE  8gz51m9hg5yuf data block
TOTO1  TABLE  8gz51m9hg5yuf data block
TOTO1  TABLE  8gz51m9hg5yuf segment header
TOTO1  TABLE  8gz51m9hg5yuf data block
If we find that CLASS=datablock, then we will want more information to diagnose, such as the object type “OTYPE” , object name and what kind of tablespace the object is stored in. The following query provides that information:
set linesize 120
col block_type for a20
col objn for a25
col otype for a15
col filen for 9999
col blockn for 9999999
col obj for a20
col tbs for a10
select
       bbw.cnt,
       bbw.obj,
       bbw.otype,
       bbw.sql_id,
       bbw.block_type,
       nvl(tbs.name,to_char(bbw.p1)) TBS,
       tbs_defs.assm ASSM
from (
    select
       count(*) cnt,
       nvl(object_name,CURRENT_OBJ#) obj,
       o.object_type otype,
       ash.SQL_ID sql_id,
       nvl(w.class,'usn '||to_char(ceil((ash.p3-18)/2))||' '||
                    decode(mod(ash.p3,2),
                         1,'header',
                         0,'block')) block_type,
       --nvl(w.class,to_char(ash.p3)) block_type,
       ash.p1 p1
    from v$active_session_history ash,
        ( select rownum class#, class from v$waitstat ) w,
        all_objects o
    where event='buffer busy waits'
      and w.class#(+)=ash.p3
      and o.object_id (+)= ash.CURRENT_OBJ#
      and ash.session_state='WAITING'
      and ash.sample_time > sysdate - &minutes/(60*24)
      --and w.class# > 18
   group by o.object_name, ash.current_obj#, o.object_type,
         ash.sql_id, w.class, ash.p3, ash.p1
  ) bbw,
    (select   file_id, 
       tablespace_name name
  from dba_data_files
   ) tbs,
    (select
 tablespace_name    NAME,
        extent_management  LOCAL,
        allocation_type    EXTENTS,
        segment_space_management ASSM,
        initial_extent
     from dba_tablespaces 
   ) tbs_defs
  where tbs.file_id(+) = bbw.p1
    and tbs.name=tbs_defs.name
Order by bbw.cnt
/
and the output looks like
  CNT OBJ     OTYPE   SQL_ID        BLOCK_TYPE       TBS        ASSM
----- ------- ------- ------------- ---------------- ---------- ------
    3 TOTO1   TABLE   8gz51m9hg5yuf segment header   NO_ASSM    MANUAL
   59 TOTO1   TABLE   8gz51m9hg5yuf data block       NO_ASSM    MANUAL
Oracle 7, 8 and 9

Before Oracle 10, buffer busy waits also happened because IO blocking another user wanting to do the same IO. On Oracle 9, the main reasons for buffer busy waits are
1)       IO read contention (only Oracle 9i and below)
2)       Insert Block Contention on Tables or Indexes
3)       Rollback Segment Contention
On  7.0  – 8.1.5 see http://sites.google.com/site/embtdbo/oracle-buffer-busy-wait/oracle-buffer-busy-wait-7-8-1-5
On version 8 and 9, the p3 value has a different meaning. Instead  of meaning the block type (which is the best thing to know) it means the kind of buffer busy wait. There are only two values that matter to us, values in
100 range = read waits (basically just an IO wait)
Reader blocking Reader, ie one reader is reading a block in and another person wants to read this block and waits on a buffer busy wait p3=130.
200 range = write contetion (same as in 10g)
Writers blocking other writers for example while doing inserts either because of no free lists on the table or because everyone is inserting into the same index block.
If you have set up ASH style collection with S-ASH or have a product like DB Optimizer you can run a query like:
select
       count(*) cnt,
       o.object_name obj,
       o.object_type otype,
       ash.CURRENT_OBJ#,
       ash.SQL_ID,
       decode(substr(ash.p3,1,1),1,'read',2,'write',p3) p3
from v$active_session_history ash,
      all_objects o
where event='buffer busy waits'
   and o.object_id (+)= ash.CURRENT_OBJ#
group by o.object_name, o.object_type, ash.sql_id, ash.p3,ash.CURRENT_OBJ#
order by cnt
/
And see what kind of buffer busy waits there are and what the objects are:
 CNT OBJ     OTYPE   CURRENT_OBJ#     SQL_ID P3
--- ------- ------- ------------ ---------- ------
  1                           -1 1375352856 read
  2                           -1  996767823 read
  2                           -1 2855119862 write
 17                           -1 1375352856 write
 89 TOTO1   TABLE         296030 1212617343 write
109                       296022 1212617343 write
Often the Current_obj# is -1 so we can’t figure out what the object is . There is an alternative method
col block_type for a18
col objn for a25
col otype for a15
col event for a15
col blockn for 999999
col segment_name for a20
col partition_name for a15
col owner for a15
set timing on
/*
drop table myextents;
l
create table myextents as select * from dba_extents;
l
*/
select
       count(*),
       ext.owner,
       ext.segment_name,
       ext.partition_name,
      ext.segment_type,
        decode(substr(ash.p3,1,1),1,'read',2,'write',p3) p3
       --ash.p1,
       --ash.p2
from v$active_session_history ash,
     myextents ext
where
       event = 'buffer busy waits'
   and ( current_obj# = -1 or current_obj#=0  or current_obj# is null )
   --and sample_time > sysdate - &minutes/(60*24)
   --and session_state='WAITING'
   and  ext.file_id(+)=ash.p1 and
        ash.p2 between  ext.block_id and ext.block_id + ext.blocks
group by
       ext.owner,
       ext.segment_name,
       ext.partition_name,
       ext.segment_type,
       p3
       --ash.p1,
       --ash.p2,
       --ash.sql_id
Order by count(*)
/
Because querying DBA_EXTENTS  is a slow operation, I made a copy of DBA_EXTENTS which will be faster to query.
CNT OWNER  SEGMENT_NAME   PARTITION_NAME  SEGMENT_TYPE  P3
--- ------ -------------- --------------- ------------- --------
  1 SYS    _SYSSMU2$                      TYPE2 UNDO    read
  1 SYS    _SYSSMU3$                      TYPE2 UNDO    write
This second option of getting the object from P1 and P2 (file and block) should probably be done only with the users consent, because we would have to create a copy of the dba_extent table which might take a long time if it’s big.
No ASH ?

If you don’t have ASH data you will have to do some guess work.
Block Class (block type)
The first step in finding out the source of buffer busy waits is looking at
     V$waitstats
This will tell us what kind of datablocks we have contention on.
File with contention
You can also get an idea of what file contains the object with the buffer busy waits by looking at:
    X$KCBFWAIT
Object with contention
Starting in version 9i there is the table
    v$segstat
That will list the objects with buffer busy waits.
If you are on version 7 or 8 good luck finding the object without setting up ASH style data collection.
Why do buffer busy waits happen?

To put it most succinctly, buffer busy waits happen because two users want to change a block at the same time. Two users can change the same block, or even same row “at the same time” ie without committing, but that’s different from the actual operation of modifying the block. The modification on the block in RAM, or computer memory, can only be done by one process at at time in order to avoid memory corruptions. Different users can modify different blocks at the same time but only one user or process can modify a the same block at a time.
In order to really understand what’s going on we have to take a look at how Oracle manages memory and block access and modifications.
Here is the layout of
Above is a diagram shows some of the essential parts of Oracle in regards to performance tuning.
In the machine memory are
  •     Oracle’s SGA, or System Global Area, a memory that is shared between Oracle users
  •     LGWR – log writer process
  •     DBWR – database writer process
  •     User1,2,3 … – user processes, in this case “shadow processes”
On the machine file system are
  • Redo log files
  • Data files
The SGA is composed of (among other things)
  • Log Buffer
  • Library Cache
  • Buffer Cache
What’s important for understanding buffer busy waits is how the buffer cache is managed. Here is view of the buffer cache with more components:
In order to access a block, a user (shadow process) has to get a latch (cache buffer chains latch) which protects buckets or linked lists of buffer headers. Once the header desired if found the latch is released. The buffer headers point to the actual data block in memory. Before modifying a block in memory a user has to lock the buffer header. The buffer header is locked any time a modification is made whether it is reading a block into memory or modifying a block that is already in memory. Usually the header is locked only for a brief amount of time but when there is a lot of concurrent access the buffer header can become a bottleneck.
BBW when readling data – read by other session
A buffer busy can happen on oracle 7,8 and 9 when one user is reading a block into memory and a second user wants to read that block. Instead of the second user trying to read that block into memory as well, they just wait for the first user to finish. Starting in Oracle 10, this kind of wait was renames “read by other session”
BBW on insert
If multiple concurrent users are inserting into a table that doesn’t have free lists or is not in an ASSM tablespace then all users will end up inserting into the same block, the first one on the free list and this block will become the hot block
by adding free lists or moving the table to an ASSM tablespace we will alleviate the bottleneck.
Multiple free lists:
The other option is ASSM or Automatic Segment Space Management which is set at the tablespace level.
In this case free block information is kept in Level 1 BMB (or bitmapped blocks). These Level 1 BMBs are chosen by a hash on the users process ID thus distributing the inserts across the table.
The inserts would look something like this (somewhat exaggerated drawing)
the ASSM BMB blocks take up more space in the table , about 1 extra block for every 16 data blocks and there is overhead first looking in the header/level 3 BMB block then going to the Level 2  then level 1 and finally to the datablock but all in all ASSM is worth reduced costs of management verses free lists.
Identifying and creating ASSM tablespaces
Which tablespaces are ASSM or not?
select
        tablespace_name,
        extent_management  LOCAL,
        allocation_type    EXTENTS,
        segment_space_management ASSM,
        initial_extent
from dba_tablespaces
TABLESPACE_NAME LOCAL      EXTENTS   ASSM
--------------- ---------- --------- ------
SYSTEM          LOCAL      SYSTEM    MANUAL
UNDOTBS1        LOCAL      SYSTEM    MANUAL
SYSAUX          LOCAL      SYSTEM    AUTO
TEMP            LOCAL      UNIFORM   MANUAL
USERS           LOCAL      SYSTEM    AUTO
EXAMPLE         LOCAL      SYSTEM    AUTO
DATA            LOCAL      SYSTEM    MANUAL
creating an ASSM tablespace:
create tablespace data2 
datafile '/d3/kyle/data2_01.dbf' 
size 200M
segment space management auto;
BBW on index (because of insert)
If users are inserting data that has a rising key value, especially a monotonically rising value, then all the new inserts will have to update the leading edge leaf block of the index and with high concurrent inserts this can cause buffer busy waits.
Solutions
Hash partition the index
Reverse Key Index

BBW on old style RBS
IF block class > 18 it’s an old style RBS segment
Select  CURRENT_OBJ#||' '||o.object_name objn,
          o.object_type otype,
          CURRENT_FILE# filen,
          CURRENT_BLOCK# blockn,
          ash.SQL_ID,
          w.class ||' '||to_char(ash.p3) block_type
from v$active_session_history ash,
      (select rownum class#, class from v$waitstat ) w,
       all_objects o
where event='buffer busy waits'
    and w.class#(+)=ash.p3
    and o.object_id (+)= ash.CURRENT_OBJ#
Order by sample_time;    
OBJN        OTYPE  FILEN  BLOCKN SQL_ID        BLOCK_TYPE
----------- ------ ------ ------ ------------- ------------
54962 TOTO1 TABLE     16   45012 8gz51m9hg5yuf data block 
54962 TOTO1 TABLE     16     161 8gz51m9hg5yuf segment header
0                     14       9 8gz51m9hg5yuf  87
0                     14       9 8gz51m9hg5yuf  87
IF the block is of class > 18, the there will be no object name, so we have to look it up ourselves to be sure:
select  segment_name,  
            segment_type
from     dba_extents 
where
     &P2  between 
    block_id and block_id + blocks – 1
     and 
     file_id = &P1 ;
Plug in 14 for P1 the file # and 9 for P2 the block number:
SEGMENT_NAME   SEGMENT_TYPE
-------------- --------------
R2             ROLLBACK
solution
move to new AUM or Automatic Undo Mangement
alter system set undo_management=auto  scope=spfile;
BBW on a file header
The ASH data has two different fields that indicate the file # and block # when the wait is a buffer busy wait.
For a buffer busy wait
    File # = p1  *and* File # = current_file#
    Block # = P2  *and* Block # = current_block#
if  p1 != current_file#  or p2 != current_block# then use p1 and p2. They are more reliable.
for example
Time   P1  P2 OBJN OTYPE FN BLOCKN BLOCK_TYPE
----- --- --- ---- ----- -- ------ -----------------
11:44 202   2 -1          0      0 file header block
11:44 202   2 TOTO TABLE  1  60218 file header block
11:44 202   2 TOTO TABLE  1  60218 file header block
11:44 202   2 TOTO TABLE  1  60218 file header block
11:44 202   2 TOTO TABLE  1  60218 file header block
Notice P1 != BLOCKN (BLOCKN is CURRENT_BLOCK#)  and P2 != FN (FN is CURRENT_FILE#)
The real file # is P1 =202 and block # is P2 which is 2
In my database I only had 10 files, so what is this file# 202?!
Solution
If you are getting buffer busy waits on the file header block for a tempfile (datafile in a temporary tablespace) then try increasing the “next extent” size in the temporary tablespace.
This wait can happen when lots of extents are being allocated in the temporary tablespace.
What Would ADDM do?
Interstingly enough the ADDM page doesn’t show the new load that has recently come on the system but the analysis is there.  I clicked on the next to bottom line in the page, “Read and write contention on database blocks was consuming significant database time.
Here are the outputs for the different scenarios.
 inserts into a table contention
 inserts into a table with contention on index
RBS contention
File Header Contention