Performance Troubleshooting Series : Identifying Problematic Sessions or Queries (Method 1)
In the Previous Posts , to troubleshoot the problematic queries/long running session issues, we have sorted out two approaches, Here we discuss the Method 1
Method 1. Session
Wait Event Approach:- In this approach we will try to identify from the
problematic session what its waiting for , why its waiting, what can be
done to resolve the problem.
The
most common causes are explained here, however there may be many cases.
The events classified here are can impact the instance level hence
database performance can be degraded.
SQL> select sid,username,event,blocking_session,sql_id,prev_sql_id from v$session where username=’username’;
or
SQL> select sid,username,event,blocking_session,sql_id,prev_sql_id from v$session where sid=’sid’;
or
SQL> select session_id,username,event,blocking_session,sql_id,prev_sql_id from v$active_session_history where session_id=”;
Concentrate
on the Event column and the following events can represent you the
problem. Events are mainly classified as Administration, I/O,
Concurrency etc. And these notifies what the database is experiencing in
terms of waits. As such the waits more the problem in that particular
area is more.
For Example, you may have see the following output with above queries
<Screenshot>
As
you see above the event column shows most of the times the following
events and sessions are waiting for that event. Each wait event
represents the issue it has and followed by solution.
1. Latch Cache Buffer chains:-
Problem:-
As such more buffers (headers) are been read and trying to modify and
have a long list of chain for each buffer and repeatable reads (more
nested loops), the session get waited on doing latch cache buffer
chains.
Solution:-
Check the sql_id and generate execution plan, compare the execution
plan with previous plan from history, and you may see more nested loops.
This is due to in correct statistics which lead to more nested loops instead of hash joins.
Collect the statistics and you rerun the statement, flush the buffer pool will help.
Read more about Join methods Read here.
2. Library Cache Lock/Pin (Cursor mutex s/x):-
Problem
1:- More locks in shared pool for the objects and reread by multiple
sessions and lot of invalidations. Every time any execution happen
shared pool structure should be locked and pinned until the execution
completed again you do execution you have repin and lock, rather between
many executions you can do only one time lock/pin that will help to
reduce this issue.
Problem
2:- Invalidations (statistics collection can invalidate objects in
shared pool and hence to repin and relock again, so it can one time
issue but if its do more often then its a problem)
Solution:- 1. flush the shared pool,
2. kill the session that hold Lock/Pin X in event column.
3. Where the long term you can reduce this issue by setting session_cached_cursors to a reasonable number
and cursor_space_for_time to true.
|
3. Shared Pool Latch/Latch free:-
Problem:
Inadequate shared pool size or no dynamic resize operations happening
(v$sga_resize_ops), although the automatic memory management is set,
sometime you will need to do a manual resizes since MMAN is busy or
hanged sometimes. (from my experience I did a lot of times this stuff)
Solution:- Flush shared pool and adjust pools
1. alter system flush shared pool;
2. select name,value from v$sgastat where name like ‘free%’; Observe free memory for shared pool;
3. If you do not get much free memory, give a try to resize/adjust the pools.
4. Reduce the buffer a bit and increase the shared pool a bit. to allow some resize operations.
5. show parameter db_cache_size shared_pool_size 9809000 8. Alter system set shared_pool_size=10000000; (increased a bit which reduce from buffer pool)
Long Terms if you see this is frequently happening:
Possible increase of shared pool
if permits Set cursor_sharing to similar or force (see here)
|
4. Res: mgr quantum:-
Problem 1:- Sessions burning CPU or consuming more cpu or load increased, which the other sessions are waiting for CPU.
Solution:- Read here and and identify the CPU consuming processes and take appropriate actions.
|
5.
DB File Sequential Read:- An index reads usually faster but due to
index stale statistics, wrong index sessions can be doing more index
scans.
Problems:- Use of an unselective index , Fragmented Indexes , High I/O on a particular disk or mount point, Bad application design, Index reads performance can be affected by slow I/O subsystem and/or poor database files layout, which result in a higher waits on this wait event.
Solutions:-
Picked Wrong Index:- It may be the case that table column is involved in two indexes and optimizer has picked the wrong index in this case.
1. SQL> select sid,username,event,sql_id,row_wait_obj# from v$session where sid=”;
Note:- the row_wait_obj# tell which object_id that
session is wait on , with this we know which index by checking in
dba_objects with object_id and get name of index the session is waiting.
2. Identify the Plan for the query,
SQL> select * from table(dbms_xplan.display_cursor(‘sql_id’));
3. Identify (if any previous run history of the query)
SQL> select * from table(dbms_xplan.display_awr(‘sql_id’));
4. Compare the Both Plans and observer change in the index name in highlighted sections.
5. The reason would be , the statistics of index/table would have picked up wrong index (index clustering factor)
SQL> exec dbms_stats.gather_table_stats(‘OWNER’,’TABLE_NAME’,CASCADE=>TRUE,no_invalidate=>false);
6.
Rebuild the index that you want (probably the index that has picked up
earlier) or collect statistics of the table with cascade true option
would correct the issue.
SQL> alter index indexname rebuild online;
Picked Right Index only, but still slow:- It may be the case that is has picked up right index but still running slow
1. Identify the current plan and identify which index is it
SQL> select sid,username,event,sql_id,row_wait_obj# from v$session where sid=”;
Note:- the row_wait_obj# tell which object_id that
session is wait on , with this we know which index by checking in
dba_objects with object_id and get name of index the session is waiting.
2. Identify the Plan for the query, and check the index in the plan
SQL> select * from table(dbms_xplan.display_cursor(‘sql_id’));
3. Identify (if any previous run history of the query)
SQL> select * from table(dbms_xplan.display_awr(‘sql_id’));
4. Compare the Both Plans and observe that it is same index that picked all times but this time slow.
5. The reason would be , the statistics of index/table would have picked up wrong index (index clustering factor)
SQL> exec dbms_stats.gather_table_stats(‘OWNER’,’TABLE_NAME’,CASCADE=>TRUE,no_invalidate=>false);
6.
Rebuild the index that you want (probably the index that has picked up
earlier) or collect statistics of the table with cascade true option
would correct the issue.
SQL> alter index indexname rebuild online;
|
6. DB File Scattered Read:- Usually full tablescans
Problem:
May be that no index there for the table, a Index is present but
optimizer thinks that scanning via index is costly and hence do a full
tablescan. But why then optimizer decide and how it decides, it decides
by means of statistics available for the table and indexes . If
statistics were stale and volume of data that is retrieving by query is
close to number of rows in table lead to full table scans.
|
7. Direct Path Reads
|
8. Log File Sync/Log File Parallel Write
|
9. Log Buffer Space
|
10. Buffer Free Waits
|
11. Buffer Busy Waits
|
12. Enq: Tx Row Lock Contention
|
13. Enq: TM Contention
|