What a bitmap join Index can do for you
Intoduction
You might have used bitmap indexes in data warehouse environment for the columns in which ratio of number of distinct values to number of rows in the table is very small but you might not have used extended functionality of bitmap join indexes to overcome query performance issues. In this article I would give you an overview of using bitmap join index with examples. Bitmap indexes are not explained in much detail as they are out of scope of this article.How It Works
An index provides pointers to the rows in a table that contain a given key value. A regular index stores a list of rowids for each key corresponding to the rows with that key value. In a bitmap index, a bitmap for each key value replaces a list of rowids.
Each bit in the bitmap corresponds to a possible rowid, and if the bit is set, it means that the row with the corresponding rowid contains the key value. A mapping function converts the bit position to an actual rowid, so that the bitmap index provides the same functionality as a regular index. If the number of different key values is small, bitmap indexes save space.
Bitmap indexes are most effective for queries that contain multiple conditions in the WHERE clause. Rows that satisfy some, but not all, conditions are filtered out before the table itself is accessed. This improves response time, often dramatically.
Bitmap Join Indexes extend the functionality of regular bitmap indexes by allowing the index creation on the join condition of query. Bitmap join index is basically a bitmap index on the join of two or more tables. For each value in a column of a table, a bitmap join index stores the rowids of corresponding rows in one or more other tables. In a data warehousing environment, the join condition is an equi-inner join between the primary key column or columns of the dimension tables and the foreign key column or columns in the fact table.
Bitmap join indexes are much more efficient in storage than materialized join views, an alternative for materializing joins in advance. This is because the materialized join views do not compress the rowids of the fact tables
Too much of theory, its time to see an example now
I recently tuned below query in one of the production system. You can see that below query is written in very inefficient manner and no wonder why cost is so high in execution plan.
SELECT COUNT (*) COUNT
FROM (SELECT a.*, ROWNUM rnum
FROM ( SELECT a.parent_item_id id,
a.soonest_expiration soonest_expiration,
a.quantity,
b.subsample_parent_id subsample_parent_id,
c.symbol units,
d.name,
e.parent_item_id,
f.name location,
f.PATH,
g.current_state_id,
h.name state_name,
h.action_count,
property_header1ss.property_header1,
property_header2ss.property_header2,
property_header3ss.property_header3,
property_header4ss.property_header4,
property_header5ss.property_header5,
property_header6ss.property_header6,
property_header7ss.property_header7,
property_header8ss.property_header8,
property_header9ss.property_header9,
property_header10ss.property_header10,
property_header11ss.property_header11,
property_header12ss.property_header12,
property_header13ss.property_header13,
property_header14ss.property_header14,
property_header15ss.property_header15,
property_header16ss.property_header16,
property_header17ss.property_header17,
property_header18ss.property_header18
FROM consumables$aud a,
consumables b,
units$aud c,
templates$aud d,
consumable_types$aud e,
locations$aud f,
wf_instances$aud g,
wf_states$aud h,
(SELECT a.VALUE property_header1, a.owner_id
FROM property_values$aud a,
property_descriptors$aud c
WHERE c.safe_name IN ('Property_739579')
AND a.descriptor_id = c.id) property_header1ss,
(SELECT a.VALUE property_header2, a.owner_id
FROM property_values$aud a,
property_descriptors$aud c
WHERE c.safe_name IN ('Property_772637')
AND a.descriptor_id = c.id) property_header2ss,
(SELECT a.VALUE property_header3, a.owner_id
FROM property_values$aud a,
property_descriptors$aud c
WHERE c.safe_name IN ('Property_739685')
AND a.descriptor_id = c.id) property_header3ss,
(SELECT a.VALUE property_header4, a.owner_id
FROM property_values$aud a,
property_descriptors$aud c
WHERE c.safe_name IN ('Property879375853')
AND a.descriptor_id = c.id) property_header4ss,
(SELECT a.VALUE property_header5, a.owner_id
FROM property_values$aud a,
property_descriptors$aud c
WHERE c.safe_name IN ('Property_739806')
AND a.descriptor_id = c.id) property_header5ss,
(SELECT a.VALUE property_header6, a.owner_id
FROM property_values$aud a,
property_descriptors$aud c
WHERE c.safe_name IN ('Property_739947')
AND a.descriptor_id = c.id) property_header6ss,
(SELECT a.VALUE property_header7, a.owner_id
FROM property_values$aud a,
property_descriptors$aud c
WHERE c.safe_name IN ('Property_740018')
AND a.descriptor_id = c.id) property_header7ss,
(SELECT a.VALUE property_header8, a.owner_id
FROM property_values$aud a,
property_descriptors$aud c
WHERE c.safe_name IN ('Property_740091')
AND a.descriptor_id = c.id) property_header8ss,
(SELECT a.VALUE property_header9, a.owner_id
FROM property_values$aud a,
property_descriptors$aud c
WHERE c.safe_name IN ('Property_740251')
AND a.descriptor_id = c.id) property_header9ss,
(SELECT a.VALUE property_header10, a.owner_id
FROM property_values$aud a,
property_descriptors$aud c
WHERE c.safe_name IN ('Property_740338')
AND a.descriptor_id = c.id) property_header10ss,
(SELECT a.VALUE property_header11, a.owner_id
FROM property_values$aud a,
property_descriptors$aud c
WHERE c.safe_name IN ('Property_740428')
AND a.descriptor_id = c.id) property_header11ss,
(SELECT a.VALUE property_header12, a.owner_id
FROM property_values$aud a,
property_descriptors$aud c
WHERE c.safe_name IN ('Property_817093')
AND a.descriptor_id = c.id) property_header12ss,
(SELECT a.VALUE property_header13, a.owner_id
FROM property_values$aud a,
property_descriptors$aud c
WHERE c.safe_name IN ('Property_817204')
AND a.descriptor_id = c.id) property_header13ss,
(SELECT a.VALUE property_header14, a.owner_id
FROM property_values$aud a,
property_descriptors$aud c
WHERE c.safe_name IN ('Property_817317')
AND a.descriptor_id = c.id) property_header14ss,
(SELECT a.VALUE property_header15, a.owner_id
FROM property_values$aud a,
property_descriptors$aud c
WHERE c.safe_name IN ('Property_818108')
AND a.descriptor_id = c.id) property_header15ss,
(SELECT a.VALUE property_header16, a.owner_id
FROM property_values$aud a,
property_descriptors$aud c
WHERE c.safe_name IN ('Property_818230')
AND a.descriptor_id = c.id) property_header16ss,
(SELECT a.VALUE property_header17, a.owner_id
FROM property_values$aud a,
property_descriptors$aud c
WHERE c.safe_name IN ('Property_1868863741')
AND a.descriptor_id = c.id) property_header17ss,
(SELECT a.VALUE property_header18, a.owner_id
FROM property_values$aud a,
property_descriptors$aud c
WHERE c.safe_name IN ('Property_895380')
AND a.descriptor_id = c.id) property_header18ss
WHERE b.id = a.parent_item_id
AND f.id = a.location_id
AND a.template_id = d.id
AND a.in_case = 0
AND b.site_id = 107132
AND d.consumable_type_id = e.id
AND c.id = e.units_id
AND g.id = a.workflow_instance_id
AND h.id = g.current_state_id
AND a.version = (SELECT version
FROM consumables
WHERE id = a.parent_item_id)
AND h.action_count != 0
AND property_header1ss.owner_id(+) = a.id
AND property_header2ss.owner_id(+) = a.id
AND property_header3ss.owner_id(+) = a.id
AND property_header4ss.owner_id(+) = a.id
AND property_header5ss.owner_id(+) = a.id
AND property_header6ss.owner_id(+) = a.id
AND property_header7ss.owner_id(+) = a.id
AND property_header8ss.owner_id(+) = a.id
AND property_header9ss.owner_id(+) = a.id
AND property_header10ss.owner_id(+) = a.id
AND property_header11ss.owner_id(+) = a.id
AND property_header12ss.owner_id(+) = a.id
AND property_header13ss.owner_id(+) = a.id
AND property_header14ss.owner_id(+) = a.id
AND property_header15ss.owner_id(+) = a.id
AND property_header16ss.owner_id(+) = a.id
AND property_header17ss.owner_id(+) = a.id
AND property_header18ss.owner_id(+) = a.id
ORDER BY UPPER (d.name) ASC) a
WHERE parent_item_id = 155550 AND 1 = 1) a
WHERE rnum >= 0
SQL> select * from table(dbms_xplan.display());
The contents of execution plan has been truncated as it was quite lengthy.
PLAN_TABLE_OUTPUT
Plan hash value: 166816580
------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | | 176M (1)|127:23:26 |
| 1 | SORT AGGREGATE | | 1 | 13 | | | |
| 2 | VIEW | | 77 | 1001 | | 176M (1)|127:23:26 |
| 3 | COUNT | | | | | | |
| 4 | VIEW | | 77 | | | 176M (1)|127:23:26 |
| 5 | SORT ORDER BY | | 77 | 2730K| 232G| 176M (1)|127:23:26 |
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------
| 6 | FILTER | | | | | | |
| 7 | HASH JOIN RIGHT OUTER | | 15M| 515G| 219M| 11M (1)| 08:18:56 |
| 8 | VIEW | | 113K| 218M| | 1348 (4)| 00:00:04 |
| 9 | HASH JOIN | | 113K| 4562K| | 1348 (4)| 00:00:04 |
| 10 | TABLE ACCESS FULL | PROPERTY_DESCRIPTORS$AUD | 17 | 391 | | 26 (0)| 00:00:01 |
| 11 | TABLE ACCESS FULL | PROPERTY_VALUES$AUD | 589K| 10M| | 1308 (3)| 00:00:04 |
| 12 | HASH JOIN RIGHT OUTER | | 4811K| 153G| 219M| 3705K (1)| 02:40:13 |
| 13 | VIEW | | 113K| 218M| | 1348 (4)| 00:00:04 |
| 14 | HASH JOIN | | 113K| 4562K| | 1348 (4)| 00:00:04 |
| 15 | TABLE ACCESS FULL | PROPERTY_DESCRIPTORS$AUD | 17 | 391 | | 26 (0)| 00:00:01 |
| 16 | TABLE ACCESS FULL | PROPERTY_VALUES$AUD | 589K| 10M| | 1308 (3)| 00:00:04 |
122 rows selected.
In the above query you clearly see that below subquery was used multiple times for the same tables with different values of column 'safe_name'.
(SELECT a.VALUE property_header1, a.owner_id
FROM property_values$aud a,
property_descriptors$aud c
WHERE c.safe_name IN ('Property_739579')
One thing we could do here was merge these multiple subqueries in single subquery and use 'case when' in outer select query which would have dramatically improved the query performance.
But again this query is something running inside vendor provided cots packages and application team rejected the suggestion of rewriting the query as below:
SQL> explain plan for SELECT COUNT (*) COUNT
2 FROM (SELECT a.*, ROWNUM rnum
3 FROM ( SELECT a.parent_item_id id,
4 a.soonest_expiration soonest_expiration,
5 a.quantity,
6 b.subsample_parent_id subsample_parent_id,
7 c.symbol units,
8 d.name,
9 e.parent_item_id,
10 f.name location,
11 f.PATH,
12 g.current_state_id,
13 h.name state_name,
14 h.action_count
/* use here case when statement to display various property headers */
15 FROM consumables$aud a,
16 consumables b,
17 units$aud c,
18 templates$aud d,
19 consumable_types$aud e,
20 locations$aud f,
21 wf_instances$aud g,
22 wf_states$aud h,
23 (SELECT a.VALUE property_header, a.owner_id
24 FROM property_values$aud a,
25 property_descriptors$aud c
26 WHERE c.safe_name IN ('Property_739579','Property_772637','Property_739685','Property879375853','Property_739806','Property_739947','Property_740091','Property_740251','Property_740338','Property_740251','Property_740338
27 AND a.descriptor_id = c.id) property_header
28 WHERE b.id = a.parent_item_id
29 AND f.id = a.location_id
30 AND a.template_id = d.id
31 AND a.in_case = 0
32 AND b.site_id = 107132
33 AND d.consumable_type_id = e.id
34 AND c.id = e.units_id
35 AND g.id = a.workflow_instance_id
36 AND h.id = g.current_state_id
37 AND a.version = (SELECT version
38 FROM consumables
39 WHERE id = a.parent_item_id)
40 AND h.action_count != 0
41 AND property_header.owner_id(+) = a.id
42 ORDER BY UPPER (d.name) ASC) a
43 WHERE parent_item_id = 155550 AND 1 = 1) a
44 WHERE rnum >= 0
45 /
Explained.
SQL>
SQL> select * from table(dbms_xplan.display());
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------
Plan hash value: 4236004457
------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | | 26734 (1)| 00:01:10 |
| 1 | SORT AGGREGATE | | 1 | 13 | | | |
| 2 | VIEW | | 1 | 13 | | 26734 (1)| 00:01:10 |
| 3 | COUNT | | | | | | |
| 4 | VIEW | | 1 | | | 26734 (1)| 00:01:10 |
| 5 | SORT ORDER BY | | 1 | 177 | 7080K| 26734 (1)| 00:01:10 |
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------
| 6 | FILTER | | | | | | |
| 7 | HASH JOIN OUTER | | 19433 | 3359K| | 1629 (5)| 00:00:05 |
| 8 | HASH JOIN | | 1185 | 197K| | 286 (4)| 00:00:01 |
| 9 | TABLE ACCESS FULL | WF_STATES$AUD | 791 | 15029 | | 7 (0)| 00:00:01 |
| 10 | HASH JOIN | | 1185 | 175K| | 278 (3)| 00:00:01 |
| 11 | HASH JOIN | | 1185 | 163K| | 133 (4)| 00:00:01 |
| 12 | TABLE ACCESS FULL | LOCATIONS$AUD | 32 | 928 | | 4 (0)| 00:00:01 |
| 13 | HASH JOIN | | 1185 | 129K| | 128 (3)| 00:00:01 |
| 14 | TABLE ACCESS BY INDEX ROWID | CONSUMABLES$AUD | 720 | 27360 | | 83 (2)| 00:00:01 |
| 15 | NESTED LOOPS | | 1286 | 110K| | 94 (2)| 00:00:01 |
| 16 | HASH JOIN | | 2 | 100 | | 8 (13)| 00:00:01 |
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------
| 17 | NESTED LOOPS | | 1 | 23 | | 3 (0)| 00:00:01 |
| 18 | TABLE ACCESS BY INDEX ROWID| CONSUMABLE_TYPES$AUD | 1 | 15 | | 2 (0)| 00:00:01 |
| 19 | INDEX RANGE SCAN | CON_TYPES_PARENT_VERSION | 1 | | | 1 (0)| 00:00:01 |
| 20 | INDEX UNIQUE SCAN | SYS_C003779 | 1 | | | 0 (0)| 00:00:01 |
| 21 | TABLE ACCESS FULL | TEMPLATES$AUD | 50 | 1350 | | 4 (0)| 00:00:01 |
| 22 | INDEX RANGE SCAN | CONSUMABLES$AUD_TEMPLATEID | 2572 | | | 4 (0)| 00:00:01 |
| 23 | TABLE ACCESS FULL | CONSUMABLES | 10443 | 244K| | 33 (4)| 00:00:01 |
| 24 | TABLE ACCESS FULL | WF_INSTANCES$AUD | 35991 | 386K| | 144 (3)| 00:00:01 |
| 25 | VIEW | | 589K| 3455K| | 1329 (4)| 00:00:04 |
| 26 | HASH JOIN | | 589K| 19M| | 1329 (4)| 00:00:04 |
| 27 | INDEX FAST FULL SCAN | PROPERTY_DESCRIPTORS$ID_SAFE | 111 | 2553 | | 7 (0)| 00:00:01 |
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------
| 28 | TABLE ACCESS FULL | PROPERTY_VALUES$AUD | 589K| 6335K| | 1308 (3)| 00:00:04 |
| 29 | TABLE ACCESS BY INDEX ROWID | CONSUMABLES | 1 | 9 | | 2 (0)| 00:00:01 |
| 30 | INDEX UNIQUE SCAN | SYS_C003629 | 1 | | | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------------------------------
The only other solution I was left with - create bitmap join index on the common join condition being used multiple times in the above query and that's what I did. I created below index and performance you can see yourself. Cost of execution plan dramatically reduced from 176M to 37K and the query which was not executing at all started giving results in less than a minute.
alter session set current_schema = dbuser;
create bitmap index propval$desc_bm
on property_values$aud (property_descriptors$aud.safe_name)
from property_values$aud, property_descriptors$aud
where property_values$aud.descriptor_id = property_descriptors$aud.id;
The index now contains the bitmap for column value 'safe_name' and since it contains bitmaps only for matching rowids returned by join condition, size of index will be comparatively smaller and hence the index is more efficient in comparison to regular bitmap index.
SQL> select * from table(dbms_xplan.display());
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------
Plan hash value: 2868485995
------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | | 37748 (1)| 00:01:38 |
| 1 | SORT AGGREGATE | | 1 | 13 | | | |
|* 2 | VIEW | | 1 | 13 | | 37748 (1)| 00:01:38 |
| 3 | COUNT | | | | | | |
| 4 | VIEW | | 1 | | | 37748 (1)| 00:01:38 |
| 5 | SORT ORDER BY | | 1 | 36315 | 18M| 37748 (1)| 00:01:38 |
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------
|* 6 | FILTER | | | | | | |
|* 7 | HASH JOIN RIGHT OUTER | | 1194 | 41M| 3656K| 27511 (1)| 00:01:12 |
| 8 | VIEW | | 1852 | 3631K| | 369 (1)| 00:00:01 |
| 9 | TABLE ACCESS BY INDEX ROWID | PROPERTY_VALUES$AUD | 1852 | 33336 | | 369 (1)| 00:00:01 |
| 10 | BITMAP CONVERSION TO ROWIDS | | | | | | |
|* 11 | BITMAP INDEX SINGLE VALUE | PROPVAL$DESC_BM | | | | | |
|* 12 | HASH JOIN RIGHT OUTER | | 1194 | 39M| 3656K| 25022 (1)| 00:01:05 |
| 13 | VIEW | | 1852 | 3631K| | 369 (1)| 00:00:01 |
| 14 | TABLE ACCESS BY INDEX ROWID | PROPERTY_VALUES$AUD | 1852 | 33336 | | 369 (1)| 00:00:01 |
| 15 | BITMAP CONVERSION TO ROWIDS | | | | | | |
|* 16 | BITMAP INDEX SINGLE VALUE | PROPVAL$DESC_BM | | | | | |
Limitation of bitmap join indexes
Join results must be stored, therefore, bitmap join indexes have the following restrictions:
1.) Parallel DML is currently only supported on the fact table. Parallel
DML on one of the participating dimension tables will mark the index as
unusable.
2.) Only one table can be updated concurrently by different transactions when using the bitmap join index.
3.) No table can appear twice in the join.
4.) You cannot create a bitmap join index on an index-organized table or a temporary table.
5.) The columns in the index must all be columns of the dimension tables.
6.) The dimension table join columns must be either primary key columns or have unique constraints.
7.) If a dimension table has composite primary key, each column in the primary key must be part of the join.