Search

Tuesday, June 28, 2016

Statistic stats

#STATISTIC  STATS

 



 

What System Statistics Are Available?
There are two kinds of system statistics: noworkload statistics and workload statistics. The main difference between the two is the method used to measure the performance of the disk I/O subsystem. Whereas the former runs a synthetic benchmark, the latter uses an application benchmark. In both cases the performance of the CPU is computed with a synthetic benchmark. Before discussing in detail the difference between these two approaches, let’s see how system statistics are stored in the data dictionary.


SYSSTATS_INFO 
 is the set containing the status of system statistics and when they were gathered. If they’re correctly gathered, STATUS is set to COMPLETED. If there’s a problem during the gathering of statistics, STATUS is set to BADSTATS, in which case the system statistics aren’t used by the query optimizer. Two more values may be seen during the gathering of workload statistics: MANUALGATHERING and AUTOGATHERING. The attribute FLAGS takes the following values:0 if the system statistics were set to the default values by calling the delete_system_stats procedure; 1 if the system statistics were regularly gathered or set; 128 if the system statistics were restored by calling the restore_system_stats procedure.

SQL> SELECT pname, pval1, pval2
2 FROM sys.aux_stats$
3 WHERE sname = 'SYSSTATS_INFO';
PNAME PVAL1 PVAL2
------- ------ -----------------
DSTART 10-25-2013 23:26
DSTOP 10-25-2013 23:28
FLAGS 1
STATUS COMPLETED 


SYSSTATS_MAIN is the set containing the system statistics themselves. Detailed information about them is provided in the next section.


SQL> SELECT pname, pval1
2 FROM sys.aux_stats$
3 WHERE sname = 'SYSSTATS_MAIN';


PNAME PVAL1
----------- ------------
CPUSPEEDNW 1991.0
IOSEEKTIM 10.0
IOTFRSPEED 4096.0
SREADTIM 1.6
MREADTIM 7.8



CPUSPEED 1992.0
MBRC 21.0
MAXTHR 659158016.0
SLAVETHR 34201600.0



SYSSTATS_TEMP is the set containing values used for the computation of system statistics.
It’s available only while gathering workload statistics.

Gathering System Statistics

Noworkload Statistics :

 dbms_stats.gather_system_stats(gathering_mode => 'noworkload')

 

 Workload Statistics
Workload statistics are available only when explicitly gathered. To gather them, you can’t use an idle system because the database engine has to take advantage of the regular database load to measure the performance of the disk I/O subsystem. On the other hand, the method for noworkload statistics is used to measure the speed of the CPU. As shown in Figure 7-2, gathering workload statistics is a three-step activity. The idea is that to compute the average time taken by an operation, it’s necessary to know how many times that operation was performed and how much time was spent executing it.

 

SQL> SELECT sum(singleblkrds) AS count, sum(singleblkrdtim)*10 AS time_ms
2 FROM v$filestat;
COUNT TIME_MS
---------- ----------
22893 36760


SQL> REMARK run a benchmark to generate some disk I/O operations...


SQL> SELECT sum(singleblkrds) AS count, sum(singleblkrdtim)*10 AS time_ms
2 FROM v$filestat;
COUNT TIME_MS
---------- ----------
54956 236430 

SQL> SELECT round((236430-36760)/(54956-22893),1) AS avg_tim_singleblkrd
2 FROM dual;


AVG_TIM_SINGLEBLKRD
-------------------
6.2 

 

 Restoring System Statistics
Whenever system statistics are changed through the dbms_stats package, instead of simply overwriting current statisticswith the new statistics, the current statistics are saved in another data dictionary table (wri$_optstat_aux_history)that keeps a history of all changes occurring within a retention period. The purpose is to be able to restore old statistics in case new statistics lead to inefficient execution plans.

For purpose of restoring old statistics, the dbms_stats package provides the restore_system_stats procedure.This procedure accepts a single parameter: a timestamp specifying the target time. Statistics are restored to those thatwere in use at that specific time. For example, the following PL/SQL block restores the system statistics that were inuse one day ago.


BEGIN
dbms_stats.delete_system_stats();
dbms_stats.restore_system_stats(as_of_timestamp => systimestamp - INTERVAL '1' DAY);
END;

 

Table Statistics

 The following query shows how to get the most important table statistics for a table:


SQL> SELECT num_rows, blocks, empty_blocks, avg_space, chain_cnt, avg_row_len
2 FROM user_tab_statistics
3 WHERE table_name = 'T';


NUM_ROWS BLOCKS EMPTY_BLOCKS AVG_SPACE CHAIN_CNT AVG_ROW_LEN
-------- ------ ------------ --------- --------- -----------
1000 44 0 0 0 266


Here is an explanation of the table statistics returned by this query:
• num_rows is the number of rows in the table.
• blocks is the number of blocks below the high watermark in the table.
• empty_blocks is the number of blocks above the high watermark in the table. This value isn’t
computed by the dbms_stats package. It’s set to 0 (unless another value is already stored in
the data dictionary).
• avg_space is the average free space (in bytes) in the table’s data blocks. This value isn’t
computed by the dbms_stats package. It’s set to 0 (unless another value is already stored in
the data dictionary).
• chain_cnt is the sum of the rows in the table that are chained or migrated to another block
(chained and migrated rows are described in Chapter 16). Even though the query optimizer
uses this value, the dbms_stats package doesn’t compute it. It’s set to 0 (unless another value
is already stored in the data dictionary).
• avg_row_len is the average size (in bytes) of a row in the table.

 

Column Statistics
 

The following query shows how to get the most important column statistics for a table:


SQL> SELECT column_name AS "NAME",
2 num_distinct AS "#DST",
3 low_value,
4 high_value,
5 density AS "DENS",
6 num_nulls AS "#NULL",
7 avg_col_len AS "AVGLEN",
8 histogram,
9 num_buckets AS "#BKT"
10 FROM user_tab_col_statistics
11 WHERE table_name = 'T';


 

Here is an explanation of the column statistics returned by this query:
• num_distinct is the number of distinct values in the column.
• low_value is the lowest value in the column. It’s shown in the internal representation. Note
that for string columns (in the example, the pad column), only the first 32 bytes (64 bytes as of
version 12.1) are used.
• high_value is the highest value in the column. It’s shown in the internal representation.
Notice that for string columns (in the example, the pad column), only the first 32 bytes (64
bytes as of version 12.1) are used.




LOW_VALUE AND HIGH_VALUE FORMAT
Unfortunately, the columns low_value and high_value aren’t easily decipherable. In fact, they display the values using the binary internal representation used by the database engine to store data. To convert them to human readable values, there are two possibilities.
First, the utl_raw package provides the functions cast_to_binary_double, cast_to_binary_float, cast_
to_binary_integer, cast_to_number, cast_to_nvarchar2, cast_to_raw, and cast_to_varchar2. As the
names of the functions suggest, for each specific datatype, there is a corresponding function used to convert the internal value to the actual value. For instance, to get the low and high value of the val1 column, you can use the following query:

SQL> SELECT utl_raw.cast_to_number(low_value) AS low_value,
2 utl_raw.cast_to_number(high_value) AS high_value
3 FROM user_tab_col_statistics
4 WHERE table_name = 'T'
5 AND column_name = 'VAL1';
LOW_VALUE HIGH_VALUE
--------- ----------
39 63

Second, the dbms_stats package provides the procedures convert_raw_value (which is overloaded several
times), convert_raw_value_nvarchar, and convert_raw_value_rowid. Notice that to avoid using a PL/SQL
block, the following query uses the version 12.1 possibility to declare PL/SQL functions and procedures in the WITH clause. The purpose of the query is the same as the previous one (in the object_statistics.sql script, you find a variation of this query that supports all the most common datatypes):

SQL> WITH
2 FUNCTION convert_raw_value(p_value IN RAW) RETURN NUMBER IS
3 l_ret NUMBER;
4 BEGIN
5 dbms_stats.convert_raw_value(p_value, l_ret);
6 RETURN l_ret;
7 END;
8 SELECT convert_raw_value(low_value) AS low_value,
9 convert_raw_value(high_value) AS high_value
10 FROM user_tab_col_statistics
11 WHERE table_name = 'T'
12 AND column_name = 'VAL1'
13 /

LOW_VALUE HIGH_VALUE
--------- ----------
39                      63


• density is a decimal number between 0 and 1. Values close to 0 indicate that a restriction on that column filters out the majority of the rows. Values close to 1 indicate that a restriction on that column filters almost no rows. If no histogram is present, density is 1/num_distinct. If a histogram is present, the computation differs and depends on the type of histogram. In any case, as of version 10.2.0.4, for columns with histograms, this value is only used for backward compatibility when the optimizer_features_enable initialization parameter is set to an older release.
• num_nulls is the number of NULL values stored in the column.
• avg_col_len is the average column size in bytes.
• histogram indicates whether a histogram is available for the column and, if it’s available, which type it is. Valid values are NONE (meaning no histogram), FREQUENCY, HEIGHT BALANCED, and, as of version 12.1, TOP-FREQUENCY and HYBRID.
• num_buckets is the number of buckets in the histogram. A bucket, or category as it’s called in statistics, is a group of values of the same kind. As described in the next section, histograms are composed of at least one bucket. If no histogram is available, it’s set to 1. The maximum number of buckets is 254 up to version 11.2, and 2,048 as of version 12.1.



Histograms
The query optimizer starts from the principle that data is uniformly distributed. An example of a uniformly distributed set of data is the one stored in the id column in the test table used throughout the previous sections. In fact, it stores all integers from 1 up to 1,000 exactly once. In such a case, to produce a good estimate of the number of rows filtered out by a predicate based on that column (for example, id BETWEEN 6 AND 19), the query optimizer requires only the object statistics described in the preceding section: the minimum value, the maximum value, and the number of distinct values. If data isn’t uniformly distributed, the query optimizer can’t compute acceptable estimations without additional information. For example, given the data set stored in the val2 column (see the output of the following query), how could the query optimizer make a meaningful estimation for a predicate like val2=105? It can’t, because it has no clue that about 50 percent of the rows fulfill that predicate:

SQL> SELECT val2, count(*)
2 FROM t
3 GROUP BY val2
4 ORDER BY val2;
VAL2 COUNT(*)
---------- ----------
101        8
102       25
103       68
104        185
105       502
106        212

The additional information needed by the query optimizer to get information about the nonuniform distribution
of data is called a histogram. Prior to version 12.1, two types of histograms are available: frequency histograms and height-balanced histograms. Oracle Database 12.1 introduces two additional types to replace height-balanced histograms: top frequency histograms and hybrid histograms.

■■Caution The dbms_stats package builds top frequency histograms and hybrid histograms only when the sampling used for gathering the object statistics is based on dbms_stats.auto_sample_size (later in this chapter, the “Gathering Options” section covers this topic).


How much time does my stats job ran?

column job_name format a20
column status format a12
column actual_start_date format a36
column run_duration format a14
select  job_name, status, actual_start_date, run_duration  from dba_scheduler_job_run_details
where job_name = ‘GATHER_STATS_JOB’  order by actual_start_date;

JOB_NAME             STATUS       ACTUAL_START_DATE                RUN_DURATION
-------------------- ------------ -------------------------------- --------------
GATHER_STATS_JOB     SUCCEEDED    21-JAN-10 22.00.03.629836 +00:00 +000 02:26:12
GATHER_STATS_JOB     SUCCEEDED    22-JAN-10 22.00.05.236441 +00:00 +000 02:37:48

(OR)

set lines 100
col operation form a40 wrap head 'operation(on)'
col target form a1
spool show_auto_stat_runs.lst
select operation||decode(target,null,null,'-'||target) operation
      ,to_char(start_time,'YYMMDD HH24:MI:SS.FF4') start_time
      ,to_char(  end_time,'YYMMDD HH24:MI:SS.FF4') end_time
from dba_optstat_operations
order by start_time desc
/
clear colu
spool off
 
operation(on)                            START_TIME                END_TIME
----------------------------------- -------------------- --------------------
gather_database_stats(auto)         090518 06:09:43.3429 090520 00:00:52.7399
gather_database_stats(auto)         090517 22:00:01.0653 090518 06:01:02.1637 

Stats:- Change statistics stale percentage in 11g

Oracle automatically determines which object need statistics which does not by crosschecking the 

stale_stats for those objects. If objects stats were stale i.e more than 10% it will collect the stats 

and does not collect for rest. we can change the staleness percentage from default value to required value.

 

DBMS_STATS.gather_database_procedure arguments:- 
 
 DBMS_STATS.GATHER_DATABASE_STATS (
   estimate_percent NUMBER   DEFAULT to_estimate_percent_type 
                                                (get_param('ESTIMATE_PERCENT')),
   block_sample     BOOLEAN  DEFAULT FALSE,
   method_opt       VARCHAR2 DEFAULT get_param('METHOD_OPT'),
   degree           NUMBER   DEFAULT to_degree_type(get_param('DEGREE')),
   granularity      VARCHAR2 DEFAULT GET_PARAM('GRANULARITY'), 
   cascade          BOOLEAN  DEFAULT to_cascade_type(get_param('CASCADE')),
   stattab          VARCHAR2 DEFAULT NULL, 
   statid           VARCHAR2 DEFAULT NULL,
   options          VARCHAR2 DEFAULT 'GATHER',
   objlist          OUT      ObjectTab,
   statown          VARCHAR2 DEFAULT NULL,
   gather_sys       BOOLEAN  DEFAULT TRUE,
   no_invalidate    BOOLEAN  DEFAULT to_no_invalidate_type (
                                     get_param('NO_INVALIDATE')),
   obj_filter_list ObjectTab DEFAULT NULL);
 
 
Further read, Just for your understanding, dbms_stats.gather_***_stats(options parameter)

GATHER - gather statistics for all objects 
GATHER AUTO - gather statistics for all objects needing new statistics
GATHER STALE - gather statistics for all objects having stale statistics
GATHER EMPTY- gather statistics for all objects having no statistics
LIST AUTO - list objects needing new statistics
LIST STALE - list objects having stale statistics
LIST EMPTY - list objects having stale statistics
 
 
------Changing stale percent of particular table---SH.SALES


DBMS_STAT.set_table_prefs('SH','SALES','STALE_PERCENT',5); 
 
STALE_PERCENT - This value determines the percentage of rows in a table that have to change before the statistics on that table are deemed stale and should be regathered. The default value is 10%.
Which changes the stale_percent to 5
for this table and when stats job runs it will not look for 10% rather it looks for 5% staleness and collect the stats for this table.
we can also set the stale_percent at database level and schema level using dbms_stats.set_database_prefs & dbms_stats.set_schema_prefs subprograms.

Script for Staleness Percent for specific table

set heading on
set pages 1000
set lines 100
SELECT DT.OWNER,
DT.TABLE_NAME,
dtm.inserts,
dtm.deletes,
dtm.updates,
dtm.truncated,
dt.last_analyzed,
dt.num_rows,
ROUND ( (DELETES + UPDATES + INSERTS) / NUM_ROWS * 100) PERCENTAGE
FROM DBA_TABLES DT, DBA_TAB_MODIFICATIONS DTM
WHERE DT.OWNER = DTM.TABLE_OWNER
AND DT.TABLE_NAME = DTM.TABLE_NAME
AND NUM_ROWS > 0
–AND ROUND ( (DELETES + UPDATES + INSERTS) / NUM_ROWS * 100) >= 10
AND DT.TABLE_NAME=’TSTS076′
ORDER BY 9 desc;


Query to check the  percentage the change of statistics in dba_tab_statistics and dba_tab_modification_history

 colu anlyzd_rows form 99999,999,999
 colu tot_rows form 99999,999,999
 colu tab_name form a45
 colu chngs form 99,999,999,999
 colu pct_c form 9999999990.99
 col truncated head ‘Trun|cated’ for a5 justify l
 select dbta.owner||’.’||dbta.table_name tab_name
 , dbta.num_rows anlyzd_rows
 , to_char(dbta.last_analyzed,’yyyy-mm-dd hh24:mi:ss’) last_anlzd
 , nvl(dbta.num_rows,0)+nvl(dtm.inserts,0) -nvl(dtm.deletes,0) tot_rows
 , nvl(dtm.inserts,0)+nvl(dtm.deletes,0)+nvl(dtm.updates,0) chngs
 ,(nvl(dtm.inserts,0)+nvl(dtm.deletes,0)+nvl(dtm.updates,0)) /greatest(nvl(dbta.num_rows,0),1) pct_c
 , dtm.truncated
 from dba_tab_statistics dbta
 left outer join sys.dba_tab_modifications dtm
 on dbta.owner = dtm.table_owner
 and dbta.table_name = dtm.table_name
 and dtm.partition_name is null
 where
 –and 
 –dbta.last_analyzed < sysdate – 1
 STALE_STATS = ‘YES’
 and dbta.owner=’&owner’ order by dbta.last_analyzed desc;



Automatic Statistics Gathering

SQL> select client_name,status from dba_autotask_client;

CLIENT_NAME STATUS
------------------------------- --------
auto optimizer stats collection DISABLED
auto space advisor ENABLED
sql tuning advisor ENABLED



dbms_auto_task_admin.enable procedure to enable the automatic statistics collection task:
 
SQL> begin dbms_auto_task_admin.enable(
2 client_name=>'auto optimizer stats collection',
3 operation=>NULL,
4 window_name=>NULL);
5 end;
6 /



PL/SQL procedure successfully completed.




Check the status of the auto optimizer stats collection task:


SQL> SELECT client_name,status from dba_autotask_client;


CLIENT_NAME STATUS
------------------------------- --------
auto optimizer stats collection ENABLED
auto space advisor ENABLED
sql tuning advisor ENABLED



To disable the statistics collection task by using the dbms_auto_task_admin.disable procedure:

 
SQL> begin
2 dbms_auto_task_admin.disable(
3 client_name=> 'auto optimizer stats collection',
4 operation=> NULL,
5 window_name=> NULL);
6 end;
7 /
PL/SQL procedure successfully completed.



To find out the beginning and ending times for the automatic statistics collection job, :
 
SQL> select operation,target,start_time,end_time from dba_optstat_operations where operation='gather_database_stats(auto)';


OPERATION START_TIME END_TIME
--------------------- ---------------------------- ---------------------------
gather_database_stats 26-APR-13 10.00.02.970000 PM 26-APR-13 10.03.13.671000 PM



To set default preferences at the database level by invoking the SET_DATABASE_PREFS procedure:
 
SQL> execute dbms_stats.set_database_prefs('ESTIMATE_PERCENT','20');



To find the value of the current setting of the STALE_PERCENT parameter:
 
SQL> select dbms_stats.get_prefs ('STALE_PERCENT','SH') stale_percent from dual;


STALE_PERCENT
-------------
10



DBMS_STATS.AUTO_CASCADE, 
which means that Oracle will determine whether to collect any index statistics.


DBMS_STATS.AUTO_DEGREE
Oracle determines the degree of parallelism based on the size of the object. 
If the object is small enough, Oracle collects statistics serially, and 
if the object is large, Oracle uses the default degree of parallelism based on the number of CPUs.
 Note that the default degree is NULL, which means that the database collects statistics using parallelism only if we set the degree of parallelism at the table level with the DEGREE clause. 

ESTIMATE_PERCENT
This specifies the percentage of rows the database must use to estimate the statistics. For large tables, estimation is the only way to complete the statistics collection process within a reasonable time.
we can set a value between 0 and 100 for the estimate_percent parameter. if a table’s data is highly skewed, we should use a higher sample size to capture the variations in the data distribution.


NDV (number of distinct values) 
count, which is a key statistic calculated by setting the estimate_percent parameter to DBMS_STATS.AUTO_SAMPLE_SIZE, is statistically identical to the NDV count calculated by a 100 percent complete statistics collection. 
The best practice is to start with the AUTO_SAMPLE_SIZE and set your own sample size only if we must.

SQL> exec dbms_stats.gather_table_stats(NULL, 'MASSIVE_TABLE', estimate_percent=> dbms_stats.auto_sample_size);



Statistics gathering for partitioned tables:

ALL: Gathers subpartition-, partition-, and global-level statistics; this setting provides an
accurate set of table statistics but is extremely resource-intensive and takes much longer to
complete than a statistics collection job with the other options.


• GLOBAL: Gathers just global statistics for a table.


• PARTITION: Gathers only partition-level statistics. The partition-level statistics are rolled up at
the table level and may not be very accurate at the table level.


• GLOBAL AND PARTITION: Gathers the global- and partition-level statistics but not the
subpartition-level statistics.


• SUBPARTITION: Gathers only subpartition statistics.


• AUTO: This is the default value for the GRANULARITY parameter and determines the granularity
based on the partitioning type.




Manually Generating Statistics : -- Table Level

A. Volatile Tables
If your database contains volatile tables that experience numerous deletes (or even truncates) throughout the day,
then an automatic stats collection job that runs during the nightly maintenance window isn’t adequate. 

There are acouple of strategies we can use in this type of situation, where a table’s data keeps fluctuating throughout the day: 

1.Collect the statistics when the table has the number of rows that represent its “typical” state.
Once we do this, lock the statistics to prevent the automatic statistics collection job from
collecting fresh statistics for this table during the nightly maintenance window.


2.  The other option is to make the statistics of the table null.
we make the statistics of a table null by deleting the statistics first and then locking the table’s statistics right
after that


example:
SQL> execute dbms_stats.delete_table_stats('OE','ORDERS');
PL/SQL procedure successfully completed.


SQL> execute dbms_stats.lock_table_stats('OE','ORDERS');
PL/SQL procedure successfully completed.
SQL>



B. Bulk Loaded Tables  ( dynamic statistic )
For tables that we bulk load data into, you must collect statistics immediately after loading the data. If you don’t
collect statistics right after bulk loading data into a table, the database can still use dynamic statistics to estimate the
statistics
, but these statistics aren’t as comprehensive as the statistics that you collect.


In releases prior to Oracle Database 12c, you’d use the capability to lock statistics when a large table’s data
was frequently refreshed, so you’d have statistics available immediately after data loading. Locking statistics in this
case prevents new statistics from being collected. The optimizer uses the old statistics once the data is loaded and
available. 


In Oracle Database 12c, the database automatically gathers table statistics during the following two types of
bulk data loads:


• Create table as select to create a table and load it
• Insert into ...select into an empty table by using the direct path insert technique 


Note :online statistics feature gathers only table statistics and not index statistics.
 

You can therefore run DBMS_STATS.GATHER_INDEX_STATS after the bulk load is completed to gather the index statistics and histograms, if you need them. 


You can also enable the stats gathering at the statement level by specifying the GATHER OPTIMIZER STATISICS hint.
Automatic statistics gathering during bulk loads doesn’t happen when a table has the following characteristics:
• The table is in an Oracle-owned schema such as SYS.
• It’s a nested table.
• It’s an index-organized table (IOT).
• It’s an external table.
• It’s a global temporary table defined as ON COMMIT DELETE ROWS.
• It’s a table with virtual columns.
• It’s a table whose PUBLISH preference is set to FALSE.

• It’s a partitioned table where INCREMENTAL is set to true and extended syntax is not used. 


Locking Statistics  FOR Table/Schema


To lock the statistics for a table or a schema to freeze the statistics.

lock a table or a schema’s statistics by executing the appropriate DBMS_STATS.LOCK_* procedures. 

EXAMPLE

To lock a table’s statistics with the LOCK_TABLE_STATS procedure in the DBMS_STATS package
SQL> execute dbms_stats.lock_table_stats(ownname=>'SH',tabname=>'SALES');

To unlock the table’s statistics
SQL> execute dbms_stats.unlock_table_stats(ownname=>'SH',tabname=>'SALES');
To lock a schema’s statistics with the DBMS_STATS.LOCK_SCHEMA_STATS
SQL> execute dbms_stats.lock_schema_stats('SH');
To Unlock the statistics with the UNLOCK_SCHEMA_STATS procedure:
SQL> execute dbms_stats.unlock_schema_stats('SH');
 


Handling Missing Statistics (Dynamic sampling )
(dynamic statistics, dynamic sampling)

The tables have had data loaded into them outside the nightly batch window. You can’t collect statistics on the table during the day when the database is handling other workload.

dynamic statistics to compensate for missing statistics

In earlier releases, the databases always gathered dynamic statistics by default when it was confronted by a table with missing optimizer statistics.

 In Oracle Database 12c, the optimizer determines whether dynamic statistics collection is useful in a particular case, as well as the statistics level to use. In  12c, not only missing statistics but also insufficient statistics can trigger dynamic statistics collection by the optimizer. By default, when optimizer statistics are either missing or not sufficient, the database automatically collects dynamic statistics.

The database will scan a random sample of data blocks in a table when you enable dynamic sampling. You
enable/disable dynamic sampling in the database by setting the optimizer_dynamic_sampling initialization
parameter. Dynamic sampling is enabled by default, as you can see from the following:

 Required parameter  optimizer_dynamic_sampling 

SQL> show parameter dynamic

NAME TYPE VALUE
-------------------------- ------- -----
optimizer_dynamic_sampling integer 2


SQL> alter system set optimizer_dynamic_sampling=4 scope=both;

The optimizer decides whether to use dynamic statistics based on factors such as:
• SQL statements use parallel execution.
• You’ve created a SQL plan directive.
• The SQL statement was captured in SQL Plan Management or Automatic Workload Repository
or is currently in the shared SQL area.



Note that the sample size used for dynamic statistics at various sampling levels:

Level 0: Doesn’t use dynamic statistics.
• Level 1: Uses dynamic statistics for all tables that don’t have statistics, provided there’s at least
one non-partitioned table without statistics in the query; the table must also not have any indexes, and it must be larger than 32 blocks, which is the sample size for this level.
 

• Level 2: Uses dynamic statistics if at least one table in the query has no statistics; the sample size is 64 blocks.

• Level 3: Uses dynamic statistics if the query meets the level 2 criteria and it has one or more expressions in a WHERE clause predicate; a sample size is 64 blocks.
 

• Level 4: Uses dynamic statistics if the query meets all level 3 criteria, and in addition, it uses complex predicates such as an OR/AND operator between multiple predicates; the sample size is 64 blocks.
 

• Levels 5–9: Use dynamic statistics if the statement meets the level 4 criteria; each of these levels differs only in the sample size, which ranges from 128 to 4,086 blocks.
 

• Level 10: This level uses dynamic statistics for all statements, and the sample it uses isn’t really a sample because it checks all data blocks to get the statistics.
 

• Level 11: Uses dynamic statistics automatically when the optimizer deems it necessary. The statistics collected by setting this level are stored and made available to other queries.  

 Dynamic statistics do help in getting better cardinality estimates but are more suitable for longer-running queries in a data warehouse or a decision support system, 
rather than for queries in an OLTP database, because of the overhead involved. You must also keep in mind that the statistics collected through dynamic sampling are by no means the same as the statistics collected through the DBMS_STATS procedures.



Exporting & Importing  Statistics

To export optimizer statistics from one database to another by using the DBMS_STATS.EXPORT_*_STATS
procedures.


To Import must use execute one of the DBMS_STATS.IMPORT_*_STATS procedures to import
the statistics into a different database. You can export statistics at the table, schema, or database level. 


 1. Create a table to hold the exported statistics:

SQL> execute dbms_stats.create_stat_table(ownname=>'SH',stattab=>'mytab', tblspace=>'USERS')

2. Export the statistics for the table SH.SALES from the data dictionary into the mytab table, using the DBMS_STATS.EXPORT_*STATS procedure:


SQL> exec dbms_stats.export_table_stats(ownname=> 'SH',tabname=>'SALES',stattab=>'mytab')

3. Export and import the table mytab to the other database.


4. In a different database, import the statistics using the DBMS_STATS.IMPORT_*STATS procedure:

SQL> exec dbms_stats.import_table_stats(ownname=>'SH',tabname=>'SALES',stattab=>'MyTab',no_invalidate=>true); 



Restoring Previous Versions of Statistics 

The performance of certain queries has deteriorated suddenly after collecting fresh statistics. You want to see whether you can use an older set of statistics that you know worked well. 

Before you restore older statistics, check how far back you can go to restore older statistics 

select dbms_stats.get_stats_history_availability from dual;

GET_STATS_HISTORY_AVAILABILITY
-----------------------------------------
19-APR-OCT13 07.49.26.718000000 AM -04:00 


output show we can restore statistics to a timestamp that’s more recent than the timestamp shown, which is 19-APR-11 07.49.26.718000000 AM -04:00. 

SQL> exec dbms_stats.restore_schema_stats(ownname=>'SH',as_of_timestamp=>'19-OCT-13
01.30.31.323000 PM -04:00',no_invalidate=>false) 

/


To check the retention of stats:
database automatically purges statistics it has collected more than 31 days ago ..


select dbms_stats.get_stats_history_retention from dual;
GET_STATS_HISTORY_RETENTION
---------------------------
31 


To change the retention of Statistic
SQL> exec dbms_stats.alter_stats_history_retention(retention=>60); 


You can
similarly restore statistics for a database with the RESTORE_DATABASE_STATS procedure or

 for a table with the RESTORE_TABLE_STATS procedure.
 You can also restore dictionary stats with the RESTORE_DICTIONARY_STATS
procedure and system stats with the RESTORE_SYSTEM_STATS procedure. 




Gather System Statistics( no workload vs workload ) Statistics

You know the optimizer uses I/O and CPU characteristics of a system during the selection of an execution plan. You want to ensure that the optimizer is using accurate system statistics.

You can collect two types of system statistics to capture your system’s I/O and CPU characteristics. 
You can collect workload statistics or noworkload statistics to enable the optimizer to better estimate the true I/O and CPU costs, which are a critical part of query optimization.

----Noworkload statistics
SQL> execute dbms_stats.gather_system_stats() 


When you collect the noworkload system statistics, the database captures only the cpuspeedNW, ioseektim, and iotfrspeed system statistics.

SQL> select pname, pval1 from sys.aux_stats$ where sname = 'SYSSTATS_MAIN';
PNAME                         PVAL1
-----------------                    ----------
CPUSPEED
CPUSPEEDNW           1183.90219
IOSEEKTIM                 10
IOTFRSPEED                   4096
MAXTHR
MBRC
MREADTIM
SLAVETHR
SREADTIM
9 rows selected.



---Workload statistics
By gather system statistics while the database is processing a typical workload. These system statistics,called workload statistics, are more representative of actual system I/O and CPU characteristics and present a moreaccurate system hardware picture to the optimizer. You can collect workload system statistics  with the start and stop options---

SQL> execute dbms_stats.gather_system_stats('start')

SQL> execute dbms_stats.gather_system_stats('stop')

 select pname, pval1 from sys.aux_stats$ where sname = 'SYSSTATS_MAIN';
PNAME                 PVAL1
------------------ ----------
CPUSPEED                2040
CPUSPEEDNW         2039.06
IOSEEKTIM               14.756
IOTFRSPEED            4096
MAXTHR
MBRC                          7
MREADTIM                46605.947
SLAVETHR
SREADTIM                 51471.538
9 rows selected.


Capturing workload statistic with given time frame ex (60 min ) 
SQL> execute dbms_stats.gather_system_stats('interval',90); 


Accurate system statistics are critical for the optimizer’s evaluation of alternative execution plans. It’s through its estimates of various system performance characteristics such as I/O speed and CPU speed that the optimizer calculates the cost of, say, a full table scan versus an indexed read. 

• cpuspeedNW: This shows the noworkload CPU speed, in terms of the average number of CPU cycles per second.
• ioseektim: This is the sum of seek time, latency time, and OS overhead time.
• iotfrspeed: This stands for I/O transfer speed and tells the optimizer how fast the database can read data in a single read request.
• cpuspeed: This stands for CPU speed during a workload statistics collection.
• maxthr: This shows the maximum I/O throughput.
• slavethr: This shows the average parallel slave I/O throughput.
• sreadtim: The Single Block Read Time statistic shows the average time for a random singleblock read.
• mreadtim: The Multiblock Read Time statistic shows the average time (in milliseconds) for a sequential multiblock read.
• mbrc: The Multi Block Read Count statistic shows the average multiblock read count in blocks. 



 If the database performs any full table scans during the workload statistics collection period, Oracle uses the value of the mbrc and the mreadtim statistics to estimate the cost of a full table scan. In the absence of these two statistics, the database uses the value of the db_file_multiblock_read_count parameter to estimate the cost of full table scans.
 

You can delete all system statistics by executing the DELETE_SYSTEM_STATS procedure:
SQL> execute dbms_stats.delete_system_stats()



Validating New Statistics   ( on table level ) 

 Collecting new statistics, but you don’t want the database to automatically use those statistics until you confirm that they will not bring in worse performance than what you have now. 

EX : how to keep the database from automatically publishing new statistics for a table

Statement sets the preference for the PUBLISH parameter to false (default=true) for
the SH.SALES table this process called  pending statistics

 
SQL> execute dbms_stats.set_table_prefs('SH','SALES','PUBLISH','false');


Collect new statistics for the SH.SALES table:
SQL> exec dbms_stats.gather_table_stats('sh','sales'); 


Tell the optimizer to use the newly collected pending statistics so we can test your queries
with those statistics:
SQL> alter session set optimizer_use_pending_statistics=true; 


Tests by running a workload against the SH.SALES table and checking the
performance and the execution plans. 


If you’re happy with the new set of (pending) statistics, make them public using publish procedure: 
execute dbms_stats.publish_pending_stats('SH','SALES'); 

If the performace is poor with stats you want to delete the new statistics instead,
SQL> exec dbms_stats.delete_pending_stats('SH','SALES'); 


 relavent Parameters:
_optimizer_invalidation_period 
optimizer_use_pending_statistics
delete_pending_stats


we can also do this at the schema level but not at the database level. If working at the 

schema level, 

you need to run the following statements instead (the schema name is SH): 

SQL> execute dbms_stats.set_schema_prefs('SH','PUBLISH','false');
SQL> execute dbms_stats.publish_pending_stats(null,null);
SQL> execute dbms_stats.delete_pending_stats('SH');




Forcing the Optimizer to Use an Index :: can be seen in explan

using a certain index on a column is going to speed up a query, but the optimizer doesn’t use the index in its execution plans. You want to force the optimizer to use the index. 

we can force the optimizer to use an index when it isn’t doing so by adjusting the optimizer_index_cost_adj initialization parameter. You can set this parameter at the system or session level. 

example that shows how to set this parameter at the session level:
SQL> alter session set optimizer_index_cost_adj=50; 


The default value for the optimizer_index_cost_adj parameter is 100, and you can set the parameter to a value between 0 and 10000. The lower the value of the parameter, the more likely it is for the optimizer to use an index. 

related parameter :
optimizer_index_cost_adj 


we can also use a more “scientific” way to figure out the correct setting for the optimizer_index_cost_adj parameter by setting it to a value that reflects the “true” difference between single and multiblock reads.

SQL> select event, average_wait from v$system_event  where event like 'db file s%read';
 

EVENT                            AVERAGE_WAIT
-----------------------              ------------
db file sequential read                    .91
db file scattered read                      1.41 



Based on the output of this query, single-block (sequential) reads take roughly 75 percent of the time it takes to perform a multiblock (scattered) read. 
This indicates that the optimizer_cost_index_adj parameter should be set to somewhere around 75. However, as we mentioned earlier, setting the parameter at the database level isn’t
recommended; instead, use this parameter sparingly for specific statements where you want to force the use of an index. 




Enabling Query Optimizer Features  
 ( in higher version database we can run lower version optimizer to avoid sql regression )  or we can test sql from higher version test database with lower version db to perform sql testing  the change the optimizer value where the tuned to query then export the stats of the query from test database to production database.

You’ve upgraded your database, but you want to ensure the query plans don’t change because of new optimizer features in the new release. 

to set the optimizer features level for an entire database.
You can, however, enable it just at the session level (alter session ...) to test for regressions in execution plans following an upgrade. You can also specify the release number with a hint so you can test a query with optimizer features from a specific release, as shown here in an 11.2 release database: 


alter system set optimizer_features_enable='11.2.0.4';

show parameter optimizer_features_enable;

NAME TYPE VALUE
------------------------- ------ --------
optimizer_features_enable string 11.2.0.4 


sql>select /*+ optimizer_features_enable ('11.1.0.6') */ sum(sales) from sales order by product_id; 



Keeping the Database from Creating Histograms 
(histograms are used when a column's values warrant a change to the execution plan.)(Note that when the optimizer notices that there’s a histogram on a table, it marks the data in that column as
skewed.)



You think that the presence of a histogram on a particular column is leading to suboptimal execution plans. You want
the database not to use any histograms on that column.
Histograms may help the Oracle optimizer in deciding whether to use an index vs. a full-table scan (where index values are skewed) or help the optimizer determine the fastest table join order.  For determining the best table join order, the WHERE clause of the query can be inspected along with the execution plan for the original queryIf the cardinality of the table is too-high, then histograms on the most selective column in the WHERE clause will tip-off the optimizer and change the table join order. 

Oracle histograms statistics can be created when you have a highly skewed index, where some values have a disproportional number of rows. In the real world, this is quite rare, and one of the most common mistakes with the CBO is the unnecessary introduction of histograms in the CBO statistics. As a general rule, histograms are used when a column's values warrant a change to the execution plan.



1. Drop the histogram by executing the DELETE_COLUMN_STATS procedure:

begin
dbms_stats.delete_column_stats(ownname=>'SH',tabname=>'SALES',
colname=>'PROD_ID',col_stat_type=>'HISTOGRAM');
end;
 /

PL/SQL procedure successfully completed.

2. Once you drop the histogram, tell Oracle not to create a histogram on the PROD_ID column by executing the following SET_TABLE_PREFS procedure:


begin
dbms_stats.set_table_prefs('SH','SALES','METHOD_OPT','FOR ALL COLUMNS SIZE
AUTO,
FOR COLUMNS SIZE 1 PROD_ID');

end;
/
PL/SQL procedure successfully completed.
 



Understanding Adaptive Cursor Sharing (user-defined bind variables)

 adaptive cursor sharing enables a SQL statement with bind variables to use multiple execution plans, with each execution plan based on the values of the bind variables. Adaptive cursor sharing is enabled by default, and you can’t disable it.

Our test table, DEMO, has 78,681 rows. The data has three columns, which are all skewed. Thus, when we gathered statistics for this table, we created histograms on the three columns, as shown here:

SQL> select column_name,table_name,histogram from user_TAB_COLUMNS where table_name='DEMO';


COLUMN_NAME    TABLE_NAME      HISTOGRAM
--------------                 -----------             ---------------
RNUM                         DEMO               HEIGHT BALANCED
RNAME                       DEMO               HEIGHT BALANCED
STATUS                      DEMO                FREQUENCY 


Note that when the optimizer notices that there’s a histogram on a table, it marks the data in that column as skewed. The column STATUS has two values: Coarse and Fine. Only 157 rows have the value of Coarse, and 78,524 rows have the value Fine, making the data extremely skewed.

Let’s perform a sequence of operations to illustrate how adaptive cursor sharing works. Issue a query with the bind variable set to the value Coarse. Since very few rows in the DEMO table have this value, we expect the database to use an index range scan, which is exactly what the optimizer does. Here is our query and its execution:


SQL> var b varchar2(6)
SQL> exec :b:='Coarse';


PL/SQL procedure successfully completed.


SQL> select /*+ ACS */ count(*) from demo where status = :b;
COUNT(*)
----------
157
SQL> select * from table(dbms_xplan.display_cursor); 



to check whether the database has marked the STATUS column as  bind-sensitive or bind-aware or both:
 

SQL> select child_number, executions, buffer_gets, is_bind_sensitive as
 "BIND_SENSI", is_bind_aware as "BIND_AWARE", is_shareable as "BIND_SHARE"
 from v$SQL
* where sql_text like 'select /*+ ACS */%'
;



CHILD_NUMBER EXECUTIONS BUFFER_GETS BIND_SENSI BIND_AWARE BIND_SHARE
------------ ---------- ----------- ----------- ---------- -----------
0 1 43 Y N Y



bind-sensitive :  because there’s a histogram on the column STATUS
bind aware :  If the execution statistics differ significantly, it marks the column as bind-aware. One of the inputs the database uses in deciding whether to mark a statement as bind-aware is the number of rows processed.  --( follow page 487)


To find skewness of the particular table  column.

select column_name, endpoint_number, endpoint_value
from user_tab_histograms
where table_name='T';



Creating Statistics for Related Columns :

You’re aware that certain columns from a table that are part of a join condition are correlated. You want to make the optimizer aware of this relationship. 


To generate statistics for two or more related columns, you must first create a column group and then collect fresh
statistics for the table so the optimizer can use the newly generated “extended statistics.” Use the DBMS_STATS.CREATE_EXTENDED_STATS function to define a column group that consists of two or more columns from a table. 



select dbms_stats.create_extended_stats(null,'CUSTOMERS', '(country_id,cust_state_province)') from dual;
 

DBMS_STATS.CREATE_EXTENDED_STATS(NULL,'CUSTOMERS','(COUNTRY_ID,CUST_STATE_PROVI
-------------------------------------------------------------------------------
SYS_STUJGVLRVH5USVDU$XNV4_IR#4

Once you create the column group, gather fresh statistics for the CUSTOMERS table to generate statistics for the new
column group.
SQL> exec dbms_stats.gather_table_stats(null,'customers'); 




Automatically Creating Column Groups   -- 493



Maintaining Statistics on Partitioned Tables 
You load data into one or more partitions frequently, and the maintenance of global statistics is a problem. You want to collect new global statistics without having to go through a time- and resource-consuming process. 


You load data into one or more partitions frequently, and the maintenance of global statistics is a problem. You want to collect new global statistics without having to go through a time- and resource-consuming process.

1. Turn on incremental statistics collection for the SH.SALES table:
SQL> exec dbms_stats.set_table_prefs('SH','SALES','INCREMENTAL','TRUE');

2. After each load into a partition, gather global table-level statistics as shown here:
SQL> exec dbms_stats.gather_table_stats('SH','SALES');

To set the incremental statistics collection feature for partitioned tables, you must specify the AUTO_SAMPLE_SIZE value for the ESTIMATE_PERCENT parameter and set value  GRANULARITY =
AUTO



Concurrent Statistics Collection for Large Tables (to reduce time for statistic collection in large table bu enabling concurrent on statis collection )

to minimize the amount of time it takes to gather statistics by taking advantage of your multiprocessor
environment.
by concurrent statistics gathering mode to gather statistics on multiple tables and multiple partitions (and subpartitions) within a table concurrently. By doing this, you can take advantage of your multiprocessor environment and complete the statistics collection process much faster. By default, concurrent statistics gathering is disabled. You enable it by executing the SET_GLOBAL_PREFS procedure. Follow these steps to enable concurrent statistics gathering:


 1. Set the job_queue_processes parameter to at least 4.
SQL>alter system set job_queue_processes=4;
 

If you don’t plan on using parallel execution for gathering statistics (see the following section) but want
to fully utilize your system resources, you must set the job_queue_processes parameter to two times the
number of CPU cores on the server.
 

2. Enable concurrent statistics gathering.
SQL> begin
dbms_stats.set_global_prefs('CONCURRENT','TRUE');
end;
/


Monitoring Concurrent Stats Collection Jobs

SQL> select job_name,state,comments from dba_scheduler_jobs where job_class like 'CONC%'; 



Determining When Statistics Are Stale   ( stale statistic )

You can query the DBA_TAB_STATISTICS and DBA_IND_STATISTICS views to determine whether statistics are stale for any object. 
The STALE_STATS view in these two views tells you whether the statistics are stale.
 

The STALE_STATS view can take one of the following values:
• YES: The statistics are stale.
• NO: The statistics aren’t stale.
• NULL: No statistics were collected for the object.
 

For the stale/fresh information to be recorded in these two view, table monitoring must be enabled for the object. However, you don’t have to set this explicitly if you’ve already set the STATISTICS_LEVEL initialization parameter to  TYPICAL or ALL. 

SQL> BEGIN
DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;
END;
/


Once you flush the database monitoring information as shown here, you can query the STALE_STATS column in the views DBA_TAB_STATISTICS and DBA_IND_STATISTICS to determine whether the statistics are stale.  Here’s an example that shows how to query the DBA_TAB_STASTISTICS view to determine whether the statistics  for the SH.SALES table are stale:

SQL> SELECT PARTITION_NAME, STALE_STATS
FROM DBA_TAB_STATISTICS
WHERE TABLE_NAME = 'SALES'
AND OWNER = 'SH'
ORDER BY PARTITION_NAME;



PARTITION_NAME STALE_STATS
--------------- -----------
SALES_2010 NO
SALES_2011 NO
SALES_H1_2012 NO
SALES_H2_2012 NO
SALES_Q1_2013 NO
SALES_Q1_2014 NO

.