Saturday, December 19, 2009

Index column ordering and performance

When we are talking about why oracle is not picking up index, one of the well known reason is cardinality. When data is uniform, an index is selective only when cardinality is not very low. When data is not uniform, we can have histograms but here we will keep that feature apart.

So lets assume we want to create a composite index on two columns which both are part of the filter condition. One column (ID) is having high cardinality and another (Status) is having very low cardinality. Then what should be the column sequence for that index? The obivious answer is, for better performance first column should be one which is having high cardinality.

But in reality, any column sequence will give the same performance. To demonstrate lets create following testcase. We will create a table and will choose two columns object_id and status. The column object_id is distinct and update the status column such that it will only two values with same number of rows.


SQL> create table t as select * from dba_objects;

Table created.




SQL> update t set status=decode(mod(rownum,2),0,'VALID','INVALID');

50623 rows updated.




SQL> commit;

Commit complete.




SQL> select status, count(1) from t group by status;

STATUS COUNT(1)
------- ----------
INVALID 25312
VALID 25311




SQL> create index ind_t on t(object_id, status);

Index created.




SQL> exec dbms_stats.gather_table_stats(user,'T',cascade=>true);

PL/SQL procedure successfully completed.





SQL> select * from t where object_id=100 and status='INVALID';


Execution Plan
----------------------------------------------------------
Plan hash value: 4013845416

-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 94 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 94 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IND_T | 1 | | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("OBJECT_ID"=100 AND "STATUS"='INVALID')


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
1207 bytes sent via SQL*Net to client
396 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed


The query executed with 4 consistent gets. Now lets create the index with reverse order and check the performance.


SQL> drop index ind_t;

Index dropped.




SQL> create index ind_t on t(status, object_id);

Index created.




SQL> exec dbms_stats.gather_table_stats(user,'T',cascade=>true);

PL/SQL procedure successfully completed.




SQL> select * from t where object_id=100 and status='INVALID';


Execution Plan
----------------------------------------------------------
Plan hash value: 4013845416

-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 94 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 94 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IND_T | 1 | | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("STATUS"='INVALID' AND "OBJECT_ID"=100)


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
1207 bytes sent via SQL*Net to client
396 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed



Even after changing the column order, the query got executed with 4 consistent gets. So no performance degradation.

So now we have prooved that column order is having no performance on the query.

But this column order may cause performance degradation. Lets assume, In the application, we have some query only with object_id as a filter. If there is no individual index on object_id column, then oracle will use this index and here column ordering will impact.

We will execute the following query with index having order (object_id, status) and lets check the execution plan.



SQL> select * from t where object_id=100;


Execution Plan
----------------------------------------------------------
Plan hash value: 4013845416

-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 94 | 3 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 94 | 3 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IND_T | 1 | | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("OBJECT_ID"=100)


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
1207 bytes sent via SQL*Net to client
396 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed



It is doing Index range scan with 4 consistent gets. Lets reverse the column order (status, object_id) and check the execution plan.


SQL> select * from t where object_id=100;


Execution Plan
----------------------------------------------------------
Plan hash value: 2246305531

-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 94 | 4 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 94 | 4 (0)| 00:00:01 |
|* 2 | INDEX SKIP SCAN | IND_T | 1 | | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("OBJECT_ID"=100)
filter("OBJECT_ID"=100)


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
8 consistent gets
0 physical reads
0 redo size
1207 bytes sent via SQL*Net to client
396 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed


Oracle has opted for index skip scan and consistent gets are also double against our previous execution plan.

In the above example, other column is having only 4 distinct values, but if the distinct values are higher then oracle may ignore this index and will opt for Full table scan.

To sum up, when query is having multiple filter conditions, column order of a composite index on those columns does not matter. But it may have impact on other queries having not all filter conditions.

No comments: