Search

Tuesday, June 21, 2016

Performance Troubleshooting Series : Troubleshooting Instance Memory Structures (SGA + PGA) Usage

In the previous series we have seen how to troubleshoot IO, CPU and Database TimeSpent/Consumed,
Today’s Post I will be sharing with you the most common issues or to tackle memory related questions.
As we all know Oracle does manage things (blocks/modifications) in memory and track them in memory and sort them in memory and store them in memory and users first read the memory structures only not the disk.
So how big the memory should be, Does that memory is adequate sized, if not what kind of waits you will be seeing specifically for those memory structures like PGA, Shared Pool, Buffer Cache etc.
I will be starting as of 10g only, these so called memory structures are managed by oracle by MMAN & MMNL processes when sga_target/memory_max_target is set > 0. That means if any of the pools with in SGA is required to adjust with in themselves these background process help to do that for you. In earlier versions , DBA has to manually resize them as require during batch timings and during day timings (I am the one of those DBA too who did it often previously). So that relieves a major work from DBA.  But there are many questions about SGA utilization and PGA utilization and sizing and historical usage.
Okay, now proceed with typical question and answers and related troubleshooting tips.

 

Okay, now proceed with typical question and answers and related troubleshooting tips. 
Question
Views Needed
Description

A new database, How much SGA/PGA you need?
None
This is tricky,  you cannot fit entire database into memory.
So ideally when a new database anticipating the common sizing start with 2g for medium and 4g for larger instances and you will need to capacity plan until you have some baseline. To capacity plan the next row will answer the question

A older database, Capacity Plan the SGA/PGA?
Current:-
v$sga_target_advice
v$pga_target_advice
v$shared_pool_advice
v$db_cache_advice
v$memory_target_advice
historical:-
dba_hist*
To Understand Current/Historical SGA sizing and performance 
SQL> select * from v$sga_target_advice;           SQL> select * from dba_hist_sga_target_advice where snap_id =&id;
http://db.geeksinsight.com/wp-content/uploads/2015/10/sga_target.jpg
                               http://db.geeksinsight.com/wp-content/uploads/2015/10/hist_sga_target.jpg
Notes:- See at column sga_size_factor,   where there is 1, its your current setting of sga, so when you have this setting size for example in above case 13gb the number of estimated physical reads were 387416890, if we increase to double the size i.e where the sga_size_factor is 2, i.e 27gb (last line) our database just have 113358182 physical reads. So there may be benefit. If benefit is less that means your sga is adequate and need to increase. Not only physical reads you can see the estimated db time from 7048407 to 6947617 it varies, that means I dont have benefit in terms of DB time but may benefit from physical reads.
If you are running a 11g memory management features i.e SGA+PGA combined use v$memory_target_advice instead.
To Understand Current/Historical PGA Sizing and performance
SQL> select * from v$pga_target_advice;               SQL> select * from dba_hist_pga_target_advice where snap_id=&id;
Good one                                                       Bad One
http://db.geeksinsight.com/wp-content/uploads/2015/10/pga_target_good.jpg      http://db.geeksinsight.com/wp-content/uploads/2015/10/pga_target_bad.jpg
What you have observed between good and bad screenshots?
The current value again stays with pga_target_factor column and where it shows 1, that means the current pga is 9gb, where in the difference is and a rule of thumb the estimated_overalloc_count should always be 0 at your current setting . If not you may need to consider increasing the pga_aggregate_target parameter, So in Good screenshot at size 1 it has 0 estimated_overalloc_count where in the Bad screenshot it shows that 94 times oracle has to do more asking os calls/internal to increase pga. Which is not good.
Although pga_aggregate_target parameter if you set at ex: 9gb , if its required session server process will use more than 9gb from os free memory with sysop calls. but if its passing that limit of 9gb it has to fetch from OS free memory and increase estimated overalloc count,  if its not able to get free memory then session/server process fails  with ora-4030 error, process out of memory error.  So pga limit is just a soft limit until 11g, in 12c you can restrict this behaviour with parameter pga_aggregate_limit
With above, you might have understood the sizing of pga and sga setting in database is adequate or not, and those historical ones too.  This will be a starting point to any of the DBA’s to estimate the sga/pga sizing.
If you want to understand about specifically buffer_pool and shared_pool advisory you can read from here

Session/Statement utilization of SGA/PGA  ?
v$sgastat
v$pgastat
v$process_memory
v$process_memory_detail
v$sesstat
v$active_session_history
dba_hist_active_sess_history
For SGA:-
To understand current sizes for each pool in the instance?
SQL> select pool,sum(bytes)/1024/1024 from v$sgastat group by pool;
http://db.geeksinsight.com/wp-content/uploads/2015/10/sga_stat.jpg
To Understand current free size with in each in the instance?
SQL> select pool,name,sum(bytes)/1024/1024 from v$sgastat where name=’free memory’ group by pool,name;
http://db.geeksinsight.com/wp-content/uploads/2015/10/sga_stat_free.jpg
To Understand current utilization of shared pool by each object that occupied,
SQL> select type,round(sum(sharable_mem)/1024/1024,2) from v$db_object_cache group by type order by 2 desc;
http://db.geeksinsight.com/wp-content/uploads/2015/10/shared_pool_usage_by_object.jpg
As you see above cursors i.e plain sql statements  using 2.2 gb and cursor statistics taking 450mb of utilization and rest others. So here most of my shared pool utilization is going for simple sql statements. If you want you can further break this query with specific user group by.
To understand current utilization by each statement,
 SQL> select sql_id,sum(sharable_mem) from v$sql where rownum < 100  group by sql_id order by 2 desc;
 http://db.geeksinsight.com/wp-content/uploads/2015/10/shared_pool_usage_by_statement2.jpg
For PGA:-
To Understand total Process memory usage overall,
SQL> select category,sum(allocated)/1024/1024 AllocMB,sum(used) UsedMB, sum(max_allocated)/1024/1024 MaxAllocMB from v$process_memory  group by category;
http://db.geeksinsight.com/wp-content/uploads/2015/10/pga_overall.jpg
To Understand specific session PGA utilization under which area.
SQL> SELECT s.sid,pm.allocated/1024/1024 Alloc_MB, used/1024/1024 UsedMB, Max_allocated/1024/1024 MaxAllocMB FROM v$session s, v$process p , v$process_memory pm WHERE s.paddr = p.addr AND p.pid = pm.pid AND s.sid IN (&1) ORDER BY sid, category;
http://db.geeksinsight.com/wp-content/uploads/2015/10/pga_session.jpg
As you see above the majority area goes to Other section of PGA consumption that means, Hash/Sorting etc. We need to understand what causing more pga utilization using workarea active. Work areas are sort area and hash area with in PGA to store the data that need to hash grouped or sorted.
SQL> select inst_id,sid,sql_id,operation_type,sum(actual_mem_used)/1024/1024,sum(work_area_size)/1024/1024, sum(tempseg_size)/1024/1024 from gv$sql_workarea_active  group by inst_id,sid,sql_id,operation_type;
http://db.geeksinsight.com/wp-content/uploads/2015/10/workarea_active.jpg
As you see which SID, SQL_ID and which area of workarea i.e hashing or sorting using more PGA can be found easily above.
Run the current_pga_by_session.sql also provide the information overall usage for each session as below.
http://db.geeksinsight.com/wp-content/uploads/2015/10/pga_by_session.jpg
From 11R2 Onwards you can get individual pga allocation from v$active_session_history only.
sql> select session_id,event,status,session_state,pga_allocated,temp_space_allocated from v$active_session_history where session_id=”;
For Historical you run
sql>  select session_id,event,status,session_state,pga_allocated,temp_space_allocated from dba_hist_active_sess_history where session_id=&sessionid and snap_id between &BEGIN_SNAP and &END_SNAP;
From historical metrics, you can access the pga allocation for each snap by
SQL> select snap_id,metric_name,max_val,avg_val from dba_hist_sysmetric_summary where metric_name=’Total PGA Allocated’;
As you got SQL_ID & SID you can see further where is full table scan is happening and reducing the full scan can eliminate hash area usage thus by pga consumption. And for sorts look out data and if it really needed to keep order by clause.

Does instance/sessions is experiencing any memory related waits?
v$system_event
v$session
v$active_session_history
dba_hist_active_sess_history
dba_hist_system_event
 If you have inadequate sga or pga there may be some waits that user experience, the following are the most common waits that can visible in wait event column for those sessions.
For SGA:-
For Buffer Pool:- free buffer wait, either your buffer pool is inadequate or dbwr is not enough to write quickly to the disk and the session waiting on this wait event to keep their blocks into buffer cache and waiting.
For Shared Pool:- latch free or latch: shared pool, these waits will be occur when there is in adequate shared pool allocated to instance and waiting to get an allocation of memory structure to parse query (SQLAREA)
For log_buffer:- redo buffer allocation retries, although this pool is not adjusted dynamically and if you less size allocated you will see this wait even its time to increase the log buffer too.
For PGA:-
direct path read, direct path read direct events appear if there is more PGA is consumed actually its a adversed affect, direct path reads consume pga memory and keep buffers in pga rather sga buffer pool. This is more in 11gr2 which should be addressed properly.
Note: latch cache buffer chains, library cache lock, cursor pin x/s, row cache lock, buffer busy wait etc, are related to concurrency issues not memory related waits.
Identify Wait Events from Current (V$session/v$active_session_history) or from AWR/Statspack
For Current:- 

Common Errors that can be caused by inadequate memory structures
Alert Log
Session/Query Failures
For Shared Pool:- ora-4031
Usually a statement failed or in alert you will see this 
ORA-04031: “unable to allocate n bytes of shared memory (“shared pool”, “object_name”, “alloc type(2,0)” …)
Few things note in above line, which pool Shared Pool, object name is namespace in db_object_cache like sql area, plsql or table or which part of shared pool structure it is unable to allocate, and how many bytes it unable to.
Generally, whenever a new query is need to parsed that need to obtain a latch for specific size bucket for the query size and memory structure is obtained and parsing will be done. In this process when a memory structure is not available for that query size particularly then its leads to this issue.
However small sized shared pool also can lead this since there is no free memory and no latch to obtain therefore no memory structure. Hence query fails.
Again, shared pool is divided into multiple subpools with number of buckets with different sizes, you can see this output and query to check in your database shared_pool_buckets.sql
http://db.geeksinsight.com/wp-content/uploads/2015/10/shared_pool_buckets1.jpg
As you see I have two pools (sgaheap(1,0) and sgaheap(2,0) with in shared pool and each pool contains free memory chunks of different sizes and also reserved free memory. So if my query requires for example 4K chunk and it can go to any pool and obtain the free chunk out of it. As many queries fire against of same size and 4K chunk is 0 then if any other query goes to pool 1 and there is no free chunk available it fails with ora-4031 unable to allocate 4096 bytes (sqlarea…..) like this.
Unfortunately, even if you have pool 2 having 4k Chunk it will not allocated to the failed query. Since your query has acquired a latch for pool 1. 
How to resolve it,
So first identify whether your shared pool has free memory with in pool,
SQL> select pool,name,sum(bytes)/1024/1024 from v$sgastat where name=’free memory’ group by pool,name;
http://db.geeksinsight.com/wp-content/uploads/2015/10/sga_stat_free.jpg
So as such you have free memory available even still the query is failing means you got the above issue, i.e shared pool fragmentation, since the shared pool is divided into multiple pools and the free chunks are distributed evenly/proportionally to the pools. This is required to reduce the waiting on long list of latch and introduced from 8i I think.
Again, unluckily there is no hard fact rule how many pools you need for a given sized shared pool, so its upto you how many pools you need, try to reduce the pools to the average, for example give a size of shared pool about 10g you may have 2 to 3 pools, if so try to reduce to 2.
Note:- Do not reduce to 1 never, this may cause another issue long waiting of latching queue. since there is only one pool and latch list will be long and hence parsing time will be more (indirectly CPU consumption).
How to check how many pools and the parameter,
SQL> select distinct ksmdsidx from x$ksmss;
  KSMDSIDX
   ———-
        1
        2
SQL> Show parameter _kghdsidx_count  
          2
Note: Changing this parameter may need bounce
For PGA – ora-4030
ora-4030 , out of process memory, There are numerous possibilities for this issue, however in an normal running system and no issues before, the following may be the cause
1. Inadequate free memory at OS
If you see more resize operations like PGA_TARGET GROW always then the PGA is inadequate and you must increase the PGA or try to reduce the following.
Secondly, Identify where PGA utilization is more, 
SQL> select category,sum(allocated)/1024/1024 AllocMB,sum(used) UsedMB, sum(max_allocated)/1024/1024 MaxAllocMB from v$process_memory  group by category;
http://db.geeksinsight.com/wp-content/uploads/2015/10/pga_overall.jpg
1. More hash joins i.e more full table scans. Identify the tables/statements that involved in 
tablescans_by_session.sql                                                                   fulltable_scans_statements.sql
http://db.geeksinsight.com/wp-content/uploads/2015/10/scans_by_sessions1.jpg                                          http://db.geeksinsight.com/wp-content/uploads/2015/10/tablescan_by_statement1.jpg
2. More direct path reads, Identify tables that are involved in direct path reads which consume pga (11gr2) and avoid full tablescans on them
3. More Sorting, identify which statement having more sorts

Improving Shared Pool Efficiency
v$sqlstat
v$sqlarea
v$library_cache_memory
v$sql_shared_pool
First understand how is your shared pool utilization,
SQL> select round(bytes/(1024*1024),0) count, ‘Shared Pool Size’ info from v$sgainfo where name = ‘Shared Pool Size’  union all  select count(*), ‘Unique sql statements’ from v$sql_shared_cursor where child_number = 0  union all select count(*), ‘Same sql but not first copy’from v$sql_shared_cursor where child_number >= 1 union all select count(*), ‘Execution plans’ from v$sql_shared_cursor  union all select count(*), ‘Count from sqlarea’ from v$sqlarea;
http://db.geeksinsight.com/wp-content/uploads/2015/10/sharedpoolusage.jpg
As you see, the database is more unique statements than common statements and closer number of executions plans observed too. That means this database is firing more unique statements and cannot be shared
Secondly, how is library cache utilization under which namespace,
SQL> select * from v$library_cache_memory;
http://db.geeksinsight.com/wp-content/uploads/2015/10/lcmemory.jpg
As we observed above the library is used mostly in sqlarea i.e  cursors or statements not the others.
Solution:-1
1. Sharing your cursors at application level, if not possible at DB level with cursor_sharing=force or similar , read here
2. If you see in time model statistics the soft parse is taking more time, then increase session_cached_cursors and cursor_space_for_time=true. at parameter level. (mostly you will be seeing library cache lock and library cache latch or cursor pin s etc as wait events for this sessions)
3. Identify the plans that frequently changing by using sql_plan_changes.sql and fix them with profile may help.
4. Reduce the DDL while DML is running/batches or running to avoid more invalidations, like statistics collection, partition maintenance
Run this query it will give overall picture of shared pool utilization  of top 5 statements in terms of disk/buffer/memory/cpu/executions:- current_expensive_queries.sql

Improving Buffer Pool Efficiency
v$bh
v$buffer_pool
x$bh
First understand how many buffers you have in the buffer pool at the moment.
SQL> SELECT name, block_size, SUM(buffers) FROM V$BUFFER_POOLGROUP BY name, block_size HAVING SUM(buffers) > 0;
NAME                 BLOCK_SIZE SUM(BUFFERS)
——————– ———- ————
DEFAULT                    8192      1174567
Second identify which object taking more buffer pool usage buffer_pool_top_object.sql
 http://db.geeksinsight.com/wp-content/uploads/2015/10/bufpool_objects.jpg
 Next, identify those statements which consuming more memory with in buffer pool and check those execution plan, Again if full tablescans happening try to reduce them or may be check the logic or may increase the buffer pool.
SQL> select to_char(last_active_time,’DD-MON HH24:MI’), sql_id,sql_text,executions,rows_processed/executions, disk_reads/executions,buffer_gets/executions from v$sql where sql_text like ‘%&top_table_above_in_buffer_pool%’ order by last_active_time,buffer_gets;
http://db.geeksinsight.com/wp-content/uploads/2015/10/buffer_pool_statements.jpg