Search

Tuesday, June 21, 2016

Performance Troubleshooting Series : Identifying Problematic Sessions or Queries (Method 1)

 

In the Previous Posts , to troubleshoot the problematic queries/long running session issues, we have sorted out two approaches, Here we discuss the Method 1
Method 1. Session Wait Event Approach:- In this approach we will try to identify from the problematic session what its waiting for , why its waiting, what can be done to resolve the problem.
The most common causes are explained here, however there may be many cases. The events classified here are can impact the instance level hence database performance can be degraded.
SQL> select sid,username,event,blocking_session,sql_id,prev_sql_id from v$session where username=’username’;
or
SQL> select sid,username,event,blocking_session,sql_id,prev_sql_id from v$session where sid=’sid’;
or
SQL> select session_id,username,event,blocking_session,sql_id,prev_sql_id from v$active_session_history where session_id=”;
Concentrate on the Event column and the following events can represent you the problem. Events are mainly classified as Administration, I/O, Concurrency etc. And these notifies what the database is experiencing in terms of waits. As such the waits more the problem in that particular area is more.
For Example, you may have see the following output with above queries
<Screenshot>
As you see above the event column shows most of the times the following events and sessions are waiting for that event. Each wait event represents the issue it has and followed by solution.

 

1. Latch Cache Buffer chains:-
Problem:- As such more buffers (headers) are been read and trying to modify and have a long list of chain for each buffer and repeatable reads (more nested loops), the session get waited on doing latch cache buffer chains.
Solution:- Check the sql_id and generate execution plan, compare the execution plan with previous plan from history, and you may see more nested loops.
This is due to in correct statistics which lead to more nested loops instead of hash joins.
Collect the statistics and you rerun the statement, flush the buffer pool will help.
Read more about Join methods Read here.

2. Library Cache Lock/Pin (Cursor mutex s/x):-
Problem 1:- More locks in shared pool for the objects and reread by multiple sessions and lot of invalidations. Every time any execution happen shared pool structure should be locked and pinned until the execution completed again you do execution you have repin and lock, rather between many executions you can do only one time lock/pin that will help to reduce this issue.
Problem 2:- Invalidations (statistics collection can invalidate objects in shared pool and hence to repin and relock again, so it can one time issue but if its do more often then its a problem)
 Solution:- 1. flush the shared pool,
                2. kill the session that hold Lock/Pin X in event column.
                3. Where the long term you can reduce this issue by setting session_cached_cursors to a reasonable number
                     and cursor_space_for_time to true.
3. Shared Pool Latch/Latch free:-
Problem: Inadequate shared pool size or no dynamic resize operations happening (v$sga_resize_ops), although the automatic memory management is set, sometime you will need to do a manual resizes since MMAN is busy or hanged sometimes. (from my experience I did a lot of times this stuff)
Solution:- Flush shared pool and adjust pools
1. alter system flush shared pool; 2. select name,value from v$sgastat where name like ‘free%’; Observe free memory for shared pool; 3. If you do not get much free memory, give a try to resize/adjust the pools. 4. Reduce the buffer a bit and increase the shared pool a bit. to allow some resize operations.
5.  show parameter db_cache_size
                                                    db_cache_size         10010000
6. Alter system set db_cache_size=9000000 (reduce a bit); 7. show parameter shared_pool_size
                               shared_pool_size   9809000
8. Alter system set shared_pool_size=10000000; (increased a bit which reduce from buffer pool)
                         
Long Terms if you see this is frequently happening: Possible increase of shared pool if permits Set cursor_sharing to similar or force (see here)
4. Res: mgr quantum:-
Problem 1:- Sessions burning CPU or consuming more cpu or load increased, which the other sessions are waiting for CPU.
                      Solution:- Read here and and identify the CPU consuming processes and take appropriate actions.
5. DB File Sequential Read:-  An index reads usually faster but due to index stale statistics, wrong index sessions can be doing more index scans.
Problems:- Use of an unselective index , Fragmented Indexes , High I/O on a particular disk or mount point, Bad application design, Index reads performance can be affected by slow I/O subsystem and/or poor database files layout, which result in a higher waits on this wait event.
Solutions:-
Picked Wrong Index:- It may be the case that table column is involved in two indexes and optimizer has picked the wrong index in this case.
                 1. SQL> select sid,username,event,sql_id,row_wait_obj# from v$session where sid=”;
                  Note:- the row_wait_obj# tell which object_id that session is wait on , with this we know which index by checking in dba_objects with object_id and get name of index the session is waiting.
2. Identify the Plan for the query,
       SQL> select * from table(dbms_xplan.display_cursor(‘sql_id’));
3. Identify (if any previous run history of the query)
       SQL> select * from table(dbms_xplan.display_awr(‘sql_id’));
4. Compare the Both Plans and observer change in the index name in highlighted sections.
5. The reason would be , the statistics of index/table would have picked up wrong index (index clustering factor)
        SQL> exec dbms_stats.gather_table_stats(‘OWNER’,’TABLE_NAME’,CASCADE=>TRUE,no_invalidate=>false);
6. Rebuild the index that you want (probably the index that has picked up earlier) or collect statistics of the table with cascade true option would correct the issue.
        SQL>  alter index indexname rebuild online;
Picked Right Index only, but still slow:- It may be the case that is has picked up right index but still running slow  
1. Identify the current plan and identify which index is it
                        SQL> select sid,username,event,sql_id,row_wait_obj# from v$session where sid=”;
                  Note:- the row_wait_obj# tell which object_id that session is wait on , with this we know which index by checking in dba_objects with object_id and get name of index the session is waiting.
2. Identify the Plan for the query, and check the index in the plan 
       SQL> select * from table(dbms_xplan.display_cursor(‘sql_id’));
3. Identify (if any previous run history of the query)
       SQL> select * from table(dbms_xplan.display_awr(‘sql_id’));
4. Compare the Both Plans and observe that it is same index that picked all times but this time slow.
5. The reason would be , the statistics of index/table would have picked up wrong index (index clustering factor)
        SQL> exec dbms_stats.gather_table_stats(‘OWNER’,’TABLE_NAME’,CASCADE=>TRUE,no_invalidate=>false);
6. Rebuild the index that you want (probably the index that has picked up earlier) or collect statistics of the table with cascade true option would correct the issue.
        SQL>  alter index indexname rebuild online;
6. DB File Scattered Read:- Usually full tablescans
Problem: May be that no index there for the table, a Index is present but optimizer thinks that scanning via index is costly and hence do a full tablescan. But why then optimizer decide and how it decides, it decides by means of statistics available for the table and indexes . If statistics were stale and volume of data that is retrieving by query is close to number of rows in table lead to full table scans.

7. Direct Path Reads
8. Log File Sync/Log File Parallel Write
9. Log Buffer Space
10. Buffer Free Waits
11. Buffer Busy Waits
12. Enq: Tx Row Lock Contention
13. Enq: TM Contention