Tuesday, September 22, 2009

Query Performance & Join Conditions

Oracle database performance is measured by application queries those are having impact on host resources. Oracle generates plan as per the input values and that decides the resource usage effeciency.

While working on performance issue, one of the problem reported by the user that one of the module is not functioning and having some issue with query. When we traced the session found one of the query doing FTS and retriving few records only. This was one of the top resource consumer query in the trace file.

The query was looking very normal and checked for the indexes, statistics of the tables etc. Though the statistics were collected recently, it was not the problem with that.

After investigation, it was found that one of the condition was not correctly defined and the same has been confirmed by developer. When you have multi table join in the query, we have to be very careful about equility predicates.

Here I want to remind one of the fact in oracle -
As per mathematics rules, when we say a=b and b=c, we can conclude a=c. But in oracle this is not the case. Oracle never concludes a=c.

To illustrate this, let us create a test case -

Create two tables and insert some random records in it. Also create required indexes. As the query provided was from the production, here I am not using the same table names. Also I have picked up only the problematic part of the query.


SQL> create table cu_all (custid number, addr varchar2(200), ph number, cano number, acctype varchar2(10));

Table created.

SQL> create table ca_receipt (custid number, caamt number, cadt date, totbal number);

Table created.

SQL>
SQL> insert into cu_all
2 select lvl,
3 dbms_random.string('A',30),
4 round(dbms_random.value(1,100000)),
5 round(dbms_random.value(1,10000)),
6 dbms_random.string('A',10)
7 from (select level "LVL" from dual connect by level <=200000);

200000 rows created.

SQL> insert into ca_receipt
2 select round(dbms_random.value(1,10000)),
3 round(dbms_random.value(1,100000)),
4 sysdate - round(dbms_random.value(1,100000)),
5 round(dbms_random.value(1,100000))
6 from (select level "LVL" from dual connect by level <=500000);

500000 rows created.

SQL> create unique index pk_cu_all_ind on cu_all(custid);

Index created.

SQL> create index ind2_cu_all on cu_all(CANO);

Index created.

SQL> create index ind_ca_receipt_custid on ca_receipt(custid);

Index created.

SQL> exec dbms_stats.gather_table_stats(user,'CU_ALL', cascade=>true);

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(user,'CA_RECEIPT', cascade=>true);

PL/SQL procedure successfully completed.



Now let us execute the query with trace on. This is the similar query which was provided to me.



SQL> set autot trace
SQL> SELECT ca.*, cu.*
2 FROM ca_receipt CA,
3 cu_all CU
4 WHERE CA.CUSTID = CU.CUSTID
5 AND CA.CUSTID IN (SELECT CUSTID FROM cu_all START WITH custid = 2353
6 CONNECT BY PRIOR CUSTID = CANO)
7 ORDER BY ACCTYPE DESC;

289 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3186098611

-------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000 | 81000 | 504 (2)| 00:00:07 |
| 1 | SORT ORDER BY | | 1000 | 81000 | 504 (2)| 00:00:07 |
|* 2 | HASH JOIN | | 1000 | 81000 | 503 (2)| 00:00:07 |
| 3 | NESTED LOOPS | | | | | |
| 4 | NESTED LOOPS | | 1000 | 26000 | 112 (1)| 00:00:02 |
| 5 | VIEW | VW_NSO_1 | 20 | 100 | 21 (0)| 00:00:01 |
| 6 | HASH UNIQUE | | 20 | 180 | | |
|* 7 | CONNECT BY WITH FILTERING | | | | | |
| 8 | TABLE ACCESS BY INDEX ROWID | CU_ALL | 1 | 9 | 2 (0)| 00:00:01 |
|* 9 | INDEX UNIQUE SCAN | PK_CU_ALL_IND | 1 | | 1 (0)| 00:00:01 |
| 10 | NESTED LOOPS | | | | | |
| 11 | CONNECT BY PUMP | | | | | |
| 12 | TABLE ACCESS BY INDEX ROWID| CU_ALL | 20 | 180 | 21 (0)| 00:00:01 |
|* 13 | INDEX RANGE SCAN | IND2_CU_ALL | 20 | | 1 (0)| 00:00:01 |
|* 14 | INDEX RANGE SCAN | IND_CA_RECEIPT_CUSTID | 50 | | 2 (0)| 00:00:01 |
| 15 | TABLE ACCESS BY INDEX ROWID | CA_RECEIPT | 50 | 1050 | 52 (0)| 00:00:01 |
| 16 | TABLE ACCESS FULL | CU_ALL | 200K| 10M| 389 (1)| 00:00:05 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CA"."CUSTID"="CU"."CUSTID")
7 - access("CANO"=PRIOR "CUSTID")
9 - access("CUSTID"=2353)
13 - access("CANO"=PRIOR "CUSTID")
14 - access("CA"."CUSTID"="CUSTID")


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
2249 consistent gets
25 physical reads
0 redo size
11748 bytes sent via SQL*Net to client
729 bytes received via SQL*Net from client
21 SQL*Net roundtrips to/from client
7 sorts (memory)
0 sorts (disk)
289 rows processed

If you look at the query, it seems to be normal one.

But the problem is here-

Query is having two tables CA and CU. From the inner CU table query, it fetches records and joins with CA table an CA table Joins with CU table using the same column.
Here the inner query joins with CA table and cardinality of the query gets changed. So it is opting FTS when joining to CU table again.
This is causing the performance bottleneck. So to resolve the issue, I have change the joining condition.

Now if we check, following is the proper execution plan. Also the consistents gets have been reduced to 797 against 2249 in original query.


SQL> SELECT ca.*, cu.*
2 FROM ca_receipt CA,
3 cu_all CU
4 WHERE CA.CUSTID = CU.CUSTID
5 AND CU.CUSTID IN (SELECT CUSTID FROM cu_all START WITH custid = 2353
6 CONNECT BY PRIOR CUSTID = CANO)
7 ORDER BY ACCTYPE DESC;

289 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3713271440

-------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000 | 81000 | 133 (2)| 00:00:02 |
| 1 | SORT ORDER BY | | 1000 | 81000 | 133 (2)| 00:00:02 |
| 2 | NESTED LOOPS | | | | | |
| 3 | NESTED LOOPS | | 1000 | 81000 | 132 (1)| 00:00:02 |
| 4 | NESTED LOOPS | | 20 | 1200 | 42 (3)| 00:00:01 |
| 5 | VIEW | VW_NSO_1 | 20 | 100 | 21 (0)| 00:00:01 |
| 6 | HASH UNIQUE | | 20 | 180 | | |
|* 7 | CONNECT BY WITH FILTERING | | | | | |
| 8 | TABLE ACCESS BY INDEX ROWID | CU_ALL | 1 | 9 | 2 (0)| 00:00:01 |
|* 9 | INDEX UNIQUE SCAN | PK_CU_ALL_IND | 1 | | 1 (0)| 00:00:01 |
| 10 | NESTED LOOPS | | | | | |
| 11 | CONNECT BY PUMP | | | | | |
| 12 | TABLE ACCESS BY INDEX ROWID| CU_ALL | 20 | 180 | 21 (0)| 00:00:01 |
|* 13 | INDEX RANGE SCAN | IND2_CU_ALL | 20 | | 1 (0)| 00:00:01 |
| 14 | TABLE ACCESS BY INDEX ROWID | CU_ALL | 1 | 55 | 1 (0)| 00:00:01 |
|* 15 | INDEX UNIQUE SCAN | PK_CU_ALL_IND | 1 | | 0 (0)| 00:00:01 |
|* 16 | INDEX RANGE SCAN | IND_CA_RECEIPT_CUSTID | 50 | | 2 (0)| 00:00:01 |
| 17 | TABLE ACCESS BY INDEX ROWID | CA_RECEIPT | 50 | 1050 | 52 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

7 - access("CANO"=PRIOR "CUSTID")
9 - access("CUSTID"=2353)
13 - access("CANO"=PRIOR "CUSTID")
15 - access("CU"."CUSTID"="CUSTID")
16 - access("CA"."CUSTID"="CU"."CUSTID")


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
797 consistent gets
1 physical reads
0 redo size
11748 bytes sent via SQL*Net to client
729 bytes received via SQL*Net from client
21 SQL*Net roundtrips to/from client
7 sorts (memory)
0 sorts (disk)
289 rows processed


Hence to sum up, oracle gives us the output based on our input values/conditions. Any query must have proper joining condition when multiple tables are involved.