About Index Selectivity:
In RDBMS databases, like
Oracle, Indexes are used in Oracle to provide quick access to rows in a table.
Indexes provide faster access to data for operations that return a small
portion of a table's rows.
Although Oracle allows an
unlimited number of indexes on a table, the indexes only help if they are used
to speed up queries. Otherwise, they just take up space and add overhead
when the indexed columns are updated. You should use the EXPLAIN PLAN feature
to determine how the indexes are being used in your queries. Sometimes, if an
index is not being used by default, you can use a query hint so that the index
is used.
Basically, B*Tree Indexes
improve the performance of queries that select a small percentage of rows from
a table. As a general guideline, we should create indexes on tables that are
often queried for less than 15%
of the table's rows. This value may be higher in situations
where all data can be retrieved from an index, or where the indexed columns can
be used for joining to other tables.
The ratio of the number of
distinct values
in the indexed column / columns to the number of records in the table represents the
selectivity of an index. The
ideal selectivity is 1. Such a selectivity can be reached only by unique indexes on NOT NULL
columns.
Example with good
Selectivity :
A table having 100'000
records and one of its indexed column has 88000 distinct values, then the selectivity of this index is 88'000 / 10'0000 = 0.88.
Oracle implicitly creates
indexes on the columns of all unique and primary keys that you define with
integrity constraints. These indexes are the most selective and the most
effective in optimizing performance. The selectivity of an index is the
percentage of rows in a table having the same value for the indexed column. An index's selectivity is good if few rows
have the same value.
Example with bad Selectivity
:
lf an index on a table of
100'000 records had only 500 distinct values, then the index's selectivity is
500 / 100'000 = 0.005 and in this case a query which uses the limitation of
such an index will retum 100'000 / 500 = 200 records for each distinct value.
It is evident that a full
table scan is more efficient as using such an index where much
more I/O is needed to scan repeatedly the index and the table.
How to Measure Index
Selectivity ?
Manually measure index
selectivity :
The ratio of the number of
distinct values to the total number of rows is the selectivity of the columns.
This method is useful to estimate the selectivity of an index before creating
it.
SQL> select count
(distinct job) “Distinct Values” from emp;
Distinct Values
---------------
5
Distinct Values
---------------
5
SQL> select count(*)
“Total Number Rows” from emp;
Total Number Rows
-----------------
14
Selectivity = Distinct Values / Total Number Rows
= 5 / 14
= 0.35
Total Number Rows
-----------------
14
Selectivity = Distinct Values / Total Number Rows
= 5 / 14
= 0.35
Automatically measure
index selectivity :
We can determine the
selectivity of an index by dividing the number of distinct indexed values by
the number of rows in the table.
SQL> create index idx_emp_job on emp(job);
SQL> analyze table emp compute statistics;
SQL> analyze table emp compute statistics;
OR
SQL> exec
dbms_stats.gather_table_stats('owner_name','table_name',cascade => TRUE);
SQL> select distinct_keys from user_indexes
where table_name = 'EMP' and index_name = 'IDX_EMP_JOB';
DISTINCT_KEYS
-------------
5
SQL> select
num_rows from user_tables where table_name = 'EMP';SQL> select distinct_keys from user_indexes
where table_name = 'EMP' and index_name = 'IDX_EMP_JOB';
DISTINCT_KEYS
-------------
5
NUM_ROWS
---------
14
Selectivity =
DISTINCT_KEYS / NUM_ROWS = 0.35
Selectivity of each individual Column :
Assuming that the table
has been analyzed it is also possible to query USER_TAB_COLUMNS to investigate
the selectivity of each column individually.
SQL> select column_name, num_distinct from user_tab_columns where table_name = 'EMP';
COLUMN_NAME
NUM_DISTINCT
------------------------------ ------------
EMPNO 14
ENAME 14
JOB 5
MGR 2
HIREDATE 13
SAL 12
COMM 4
DEPTNO 3
------------------------------ ------------
EMPNO 14
ENAME 14
JOB 5
MGR 2
HIREDATE 13
SAL 12
COMM 4
DEPTNO 3
How to choose Composite Indexes ?
A composite index contains
more than one key column. Composite indexes can provide additional advantages
over single column indexes.
Better Selectivity
|
Sometimes two or more columns, each with poor selectivity,
can be combined to form a composite index with good selectivity.
|
Adding Data Storage
|
If all the columns selected by the query are in the
composite index, Oracle can return these values from the index without
accessing the table. However in this case, it's better to use an IOT (Index
Only Table).
|
An SQL statement can use
an access path involving a composite index if the statement contains constructs
that use a leading portion of
the index. A leading portion of an index is a set of one or
more columns that were specified first and consecutively in the list of columns
in the CREATE INDEX statement that created the index. Consider this CREATE
INDEX statement:
SQL> CREATE INDEX
idx_composite ON my_table (x, y, z);
These combinations of
columns are leading portions of the index: X, XY, and XYZ. These combinations
of columns are not leading portions of the index: YZ and Z.
Guidelines for choosing
columns for composite indexes :
Consider creating a composite
index on columns that are frequently used together in WHERE clause conditions
combined with AND operators, especially if their combined selectivity is better
than the selectivity of either column individually. Consider indexing columns
that are used frequently to join tables in SQL statements. Here are basic
guidelines:
1) Use as less as columns
for composite key index
2) Use high cardinality
column in the beginning and least cardinality column at end for the composite
key index.
3) If composite key index
is used very rarely for the queries, better to avoid in OLTP production
databases and same can be created in replicated / logical standby / report
databases.
I hope this document may help you to optimize your queries more effectively.