Search

Tuesday, June 21, 2016

Oracle Histograms

Histogram Holds data the data about values within a column in a table and number of occurrences for specific value or range. Depend on predicate values used in a query it stores the range of the index. Optimizer uses the histogram to generate execution plan, it may generate a plan with Index full table scan or Index range scan. Histograms are useful when we have distinct data in the table. The information about histogram is stored in data dictionary view "dba_tab_col_statistics". METHOD_OPT parameter is used to control the creation of histograms 

creating histogram on table depends on what columns we are gathering on table. we can use different operations while creating histogram on tables 

METHOD_OPT:
FOR COLUMNS SIZE AUTO: Fastest. you can specify one column instead of all columns.
FOR ALL COLUMNS SIZE REPEAT: Prevent deletion of histograms and collect it only for columns already have histograms.
FOR ALL COLUMNS:collect histograms on all columns .
FOR ALL COLUMNS SIZE SKEWONLY:collect histograms for columns have skewed value .
FOR ALL INDEXES COLUMNS:collect histograms for columns have indexes.

when we use Auto OPTION while gathering or creating histogram on the table, Oracle will track the need of histograms depend on the column usage from sys.col_usage$. using this info internally it creates buckets (histogram) on the columns and stores access  info ( range is stored inside the bucket )

Exec dbms_stats.gather_table_stats
(ownname => 'UNIVERSE', tabname=>'INDIA'
estimate_percent => dbms_stats.auto_sample_size,
method_opt => 'FOR COLUMNS SIZE AUTO',
degree => 4); 

For example Creating histogram for frequently used column or frequency histogram :)
If the column is repeatedly access by the application. we can create histogram on the column.Here we are creating histogram for column "state" in schema universe & table india

Exec DBMS_STATS.GATHER_TABLE_STATS 
(ownname => 'UNIVERSE',
tabname => 'INDIA',
method_opt => 'FOR COLUMNS STATE',
degree => 4);

Creating Hybrid - histogram in 12c: 
before creating hybrid column, check for distinct values on table.
method_opt => 'FOR COLUMNS STATE size 10', 10 buckets are created and access info ( range is stored inside the bucket 

Exec DBMS_STATS.GATHER_TABLE_STATS 
(ownname => 'UNIVERSE',
tabname => 'INDIA',
method_opt => 'FOR COLUMNS STATE size 10',
degree => 4);

The optimizer considers both hybrid and top frequency histograms. To qualify for a top frequency histogram, the percentage of rows occupied by the top 10 most frequent values must be equal to or greater than threshold p, where p is (1-(1/10))*100, or 90%.

Query to investigate histogram  created for column "STATE" 

select table_name,column_name,num_distinct, histogram from user_tab_col_statistics where table_name='INDIA' and column_name='STATE';

Limitations 
Never create histogram on the primary key and have a less distinct values