Predicate Selectivity
Selectivity
~~~~~~~~~~~
Selectivity is a measure of the proportion of a row source retrieved by application of a particular predicate or combination of predicates.
Within the Oracle kernel it is expressed as a value between 0 and 1. The closer the value is to 0 the more selective the predicate is.
Selectivity is only used by the CBO.
Basic Selectivity formula:
~~~~~~~~~~~~~~~~~~~~~~~~~~
Number of records satisfying a condition
Selectivity = -----------------------------------------
Total Number of records
In the optimizer, selectivity is used to compare the usefulness of various predicates in combination with base object costs.
Knowing the proportion of the total data set that a column predicate defines is very helpful in defining actual access costs.
By default, column selectivity is based on the high and low values and the number of values in the column with an assumption of even distribution of data between these two points.
Histogram data can give better selectivity estimates for unevenly distributed column data. There is more discussion regarding Histograms later.
Selectivity is also used to define the cardinality of a particular row source once predicates have been applied.
Cardinality is the expected number of rows that will be retrieved from a row source. Cardinality is useful in determining nested loop join and sort costs. Application of selectivity to the original cardinality of the row source will produce the expected (computed) cardinality
for the row source.
Glossary of Terms:
~~~~~~~~~~~~~~~~~~
NDV Number of Distinct Values
Cardinality Number of rows
Selectivity Proportion of a dataset returned by a particular predicate(or group of predicates)
In the following illustrations there are 2 tables (T1 & T2) with columns (c1) and (c2) respectively.
Selectivities:
~~~~~~~~~~~~~~
Without histograms
~~~~~~~~~~~~~~~~~~
c1 = '4076' 1/NDV
c1 > '4076' 1 - (High - Value / High - Low)
c1 >= '4076' 1 - (High - Value / High - Low) + 1/NDV
c1 like '4076' 1/NDV
Join selectivity
~~~~~~~~~~~~~~~~
The selectivity of a join is defined as the selectivity of the most selective
join column adjusted by the proportion of not null values in each join column.
Sel = 1/max[NDV(t1.c1),NDV(t2.c2)] *
( (Card t1 - # t1.c1 NULLs) / Card t1) *
( (Card t2 - # t2.c2 NULLs) / Card t2)
Bind Variable selectivity
~~~~~~~~~~~~~~~~~~~~~~~~~
Bind variables present a special case because the optimizer has no idea what
the bind variable value is prior to query optimization. This does not present
a problem with equality predicates since a uniform distribution of data is
assumed and the selectivity is taken as 1/NDV for the column. However for
range predicates it presents a major issue because the optimizer does not
know where the range starts or stops. Because of this the optimizer has to
make some assumptions as follows:
c1 = :bind1 1/NDV
c1 > :bind1 Default of 5%
c1 >= :bind1 Default of 5%
c1 like :bind1 Default of 25%
For more information on bind variables see Note:70075.1
Selectivity With Histograms
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Histograms provide additional information about column selectivity for columns whose distribution is non uniform. Histograms store information about column data value ranges. Each range is stored in a single row and is often called a 'bucket'. There are 2 different methods for storing histograms in
Oracle. If there are a small number of distinct column values (i.e. less than the number of buckets), the column value and the count of that value is stored. If not then a series of endpoints are stored to enable more accurate selectivity to be determined.
The first method allows the accurate figures to be used. However with inexact histograms the terms popular and non-popular value are introduced and are used to help determine selectivity. A popular value is a value that spans multiple endpoints whereas a non-popular value does not.
See Note:72539.1 for more information on histograms.
Exact histograms
~~~~~~~~~~~~~~~~
c1 = '4706' count of value '4076' / Total Number of Rows
c1 > value count of values > '4076' / Total Number of Rows
In Exact Histograms
~~~~~~~~~~~~~~~~~~
col = pop value # popular buckets / # buckets
col = non pop (Density)
col > value # buckets > value / # buckets
Rules for combining selectivity
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Let P1 and P2 be 2 distinct predicates of query Q
P1 AND P2
S(P1&P2) = S(P1) * S(P2)
P1 OR P2
S(P1|P2) = S(P1) + S(P2) -[S(P1) * S(P2)]
Index Selectivity for concatenated indexes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Starting with 10.2, when a concatenated index, with all its columns having equality predicates, is used as an access path, the optimizer uses 1/NDK as the selectivity (where NDK is the number of distinct keys in the index).
On 9.2.0.7 and 9.2.0.8 this may be enabled with event 38060 level 1.
On 10.1.0.4 and 10.1.0.5 this may be enabled with event 38059 level 1.
On 10.2 adjustments will be made to the selectivity to account for nulls
in the index keys. This also occurs on 10.1.0.5 (with event 38059.)
Join cardinality
~~~~~~~~~~~~~~~~
Card(Pj) = Card(T1) * Card(T2) * Sel(Pj)
+++ Reference : Oracle support Metalink +++