Saturday, December 19, 2009

Index column ordering and performance

When we are talking about why oracle is not picking up index, one of the well known reason is cardinality. When data is uniform, an index is selective only when cardinality is not very low. When data is not uniform, we can have histograms but here we will keep that feature apart.

So lets assume we want to create a composite index on two columns which both are part of the filter condition. One column (ID) is having high cardinality and another (Status) is having very low cardinality. Then what should be the column sequence for that index? The obivious answer is, for better performance first column should be one which is having high cardinality.

But in reality, any column sequence will give the same performance. To demonstrate lets create following testcase. We will create a table and will choose two columns object_id and status. The column object_id is distinct and update the status column such that it will only two values with same number of rows.


SQL> create table t as select * from dba_objects;

Table created.




SQL> update t set status=decode(mod(rownum,2),0,'VALID','INVALID');

50623 rows updated.




SQL> commit;

Commit complete.




SQL> select status, count(1) from t group by status;

STATUS COUNT(1)
------- ----------
INVALID 25312
VALID 25311




SQL> create index ind_t on t(object_id, status);

Index created.




SQL> exec dbms_stats.gather_table_stats(user,'T',cascade=>true);

PL/SQL procedure successfully completed.





SQL> select * from t where object_id=100 and status='INVALID';


Execution Plan
----------------------------------------------------------
Plan hash value: 4013845416

-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 94 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 94 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IND_T | 1 | | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("OBJECT_ID"=100 AND "STATUS"='INVALID')


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
1207 bytes sent via SQL*Net to client
396 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed


The query executed with 4 consistent gets. Now lets create the index with reverse order and check the performance.


SQL> drop index ind_t;

Index dropped.




SQL> create index ind_t on t(status, object_id);

Index created.




SQL> exec dbms_stats.gather_table_stats(user,'T',cascade=>true);

PL/SQL procedure successfully completed.




SQL> select * from t where object_id=100 and status='INVALID';


Execution Plan
----------------------------------------------------------
Plan hash value: 4013845416

-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 94 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 94 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IND_T | 1 | | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("STATUS"='INVALID' AND "OBJECT_ID"=100)


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
1207 bytes sent via SQL*Net to client
396 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed



Even after changing the column order, the query got executed with 4 consistent gets. So no performance degradation.

So now we have prooved that column order is having no performance on the query.

But this column order may cause performance degradation. Lets assume, In the application, we have some query only with object_id as a filter. If there is no individual index on object_id column, then oracle will use this index and here column ordering will impact.

We will execute the following query with index having order (object_id, status) and lets check the execution plan.



SQL> select * from t where object_id=100;


Execution Plan
----------------------------------------------------------
Plan hash value: 4013845416

-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 94 | 3 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 94 | 3 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IND_T | 1 | | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("OBJECT_ID"=100)


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
1207 bytes sent via SQL*Net to client
396 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed



It is doing Index range scan with 4 consistent gets. Lets reverse the column order (status, object_id) and check the execution plan.


SQL> select * from t where object_id=100;


Execution Plan
----------------------------------------------------------
Plan hash value: 2246305531

-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 94 | 4 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 94 | 4 (0)| 00:00:01 |
|* 2 | INDEX SKIP SCAN | IND_T | 1 | | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("OBJECT_ID"=100)
filter("OBJECT_ID"=100)


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
8 consistent gets
0 physical reads
0 redo size
1207 bytes sent via SQL*Net to client
396 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed


Oracle has opted for index skip scan and consistent gets are also double against our previous execution plan.

In the above example, other column is having only 4 distinct values, but if the distinct values are higher then oracle may ignore this index and will opt for Full table scan.

To sum up, when query is having multiple filter conditions, column order of a composite index on those columns does not matter. But it may have impact on other queries having not all filter conditions.

Friday, December 18, 2009

Query and Transitive Closure

I have got some of the comments on my last blog - "Query Performance & Join Conditions" which were asking about the transitive closure. The example that I have given was not for transitive closure. Oracle has introduced transitive

closure when the literal values are present for joining conditions. For example when I have join condition on two tables and same column of one of the table is having filter condition then other table will use the same filter condition.

To demonstrate, we will see the following test case.


SQL> create table t1 as select rownum "N", rownum*30 "M" from dual connect by level <=10000;

Table created.


SQL> create table t2 as select rownum+20 "R", rownum*50 "T" from dual connect by level <=10000;

Table created.



SQL> create index ind_t1 on t1(n);

Index created.



SQL> create index ind_t2 on t2(r);

Index created.



SQL> exec dbms_stats.gather_table_stats(user,'T1',cascade=>true);

PL/SQL procedure successfully completed.



SQL> exec dbms_stats.gather_table_stats(user,'T2',cascade=>true);

PL/SQL procedure successfully completed.



SQL> set autot trace exp stat
SQL> select n, m, t
2 from t1, t2
3 where t1.n = t2.r
4 and t1.n=30;


Execution Plan
----------------------------------------------------------
Plan hash value: 863060763

----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 16 | 4 (0)| 00:00:01 |
| 1 | MERGE JOIN CARTESIAN | | 1 | 16 | 4 (0)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID | T1 | 1 | 8 | 2 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | IND_T1 | 1 | | 1 (0)| 00:00:01 |
| 4 | BUFFER SORT | | 1 | 8 | 2 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| T2 | 1 | 8 | 2 (0)| 00:00:01 |
|* 6 | INDEX RANGE SCAN | IND_T2 | 1 | | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("T1"."N"=30)
6 - access("T2"."R"=30)




Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
7 consistent gets
0 physical reads
0 redo size
512 bytes sent via SQL*Net to client
396 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
1 rows processed

Check for "Predicate Information" in above-mentioned execution plan. When I have assigned n=30, it has also picked up r=30. Now one might have question, what is the use of this and will it impact performance?

Based on the input filter condition, oracle may opt for better execution plan.
To demonstrate, lets create t2 table with R column as 4 distinct values and execute the above query again.


SQL> create table t1 as select rownum "N", rownum*30 "M" from dual connect by level <=10000;

Table created.



SQL> create table t2 as select mod(rownum,4) "R", rownum*50 "T" from dual connect by level <=10000;

Table created.



SQL> create index ind_t1 on t1(n);

Index created.



SQL> create index ind_t2 on t2(r);

Index created.



SQL> exec dbms_stats.gather_table_stats(user,'T1',cascade=>true);

PL/SQL procedure successfully completed.



SQL> exec dbms_stats.gather_table_stats(user,'T2',cascade=>true);

PL/SQL procedure successfully completed.



SQL> set autot trace exp stat
SQL> select n, m, t
2 from t1, t2
3 where t1.n = t2.r
4 and t2.r=3;

2500 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 2470703826

---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2500 | 37500 | 9 (12)| 00:00:01 |
|* 1 | HASH JOIN | | 2500 | 37500 | 9 (12)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 8 | 2 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | IND_T1 | 1 | | 1 (0)| 00:00:01 |
|* 4 | TABLE ACCESS FULL | T2 | 2500 | 17500 | 6 (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("T1"."N"="T2"."R")
3 - access("T1"."N"=3)
4 - filter("T2"."R"=3)


As per the above plan, based on filter condition(N=3), oracle opted for index scan. This is because of transitive closure. Now what would have happen if transitive closure was not taken place.
For this, lets modify query as follows -


SQL> select n, m, t
2 from t1, (select /*+ no_merge */ * from t2 where r=3) t2
3 where t1.n = t2.r;

2500 rows selected.



Execution Plan
----------------------------------------------------------
Plan hash value: 157564253

----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2500 | 90000 | 14 (8)| 00:00:01 |
|* 1 | HASH JOIN | | 2500 | 90000 | 14 (8)| 00:00:01 |
| 2 | VIEW | | 2500 | 70000 | 6 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| T2 | 2500 | 17500 | 6 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL | T1 | 10000 | 80000 | 7 (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("T1"."N"="T2"."R")
3 - filter("R"=3)


If we execute same query, oracle would have opted for nested loops.

Conclusion : Transitive closure is the feature which works when we are using filter predicate in query with join condition. Oracle will consider transitive closure with all equility, non-equility predicates.