Oracle Experiences and information: Do we need stats on all columns?

Recently I had visited one of the customers for performance review. As per the current stats gathering policy, they use following method to gather stats on the tables.


dbms_stats.gather_table_stats (user, tabname, cascade=>true, method_opt=>'FOR ALL INDEXED COLUMNS 75');

As can be seen from the command, the input from method_opt is to gather stats on all the indexed columns with bucket size as 75. This means the stats on the other non-index columns will not be collected. This is a wrong practice and stats on the columns of the table are crucial, particularly if these are use in queries.

When I was discussed with the customer, the customer had following misconception.

1. To make decision for index scan; the stats on indexed columns are available.
2. When there is no index available on column, why oracle needs stats on it. It will use FTS for such queries.

This is purely a misconception, even if a column, use in a query, is not indexed; the correct cardinality is very crucial in determining a correct join order. To prove this point, I have created following small test case.

I created two tables (test1 and test2) and indexes on it.


SQL> create table test1 as
  2  select mod(LVL, 3)  ID, LVL VAL from
  3  (select level "LVL" from dual connect by level <= 100000);

Table created.

SQL> create index ind_test1 on test1 (id);

Index created.

SQL> create table test2 as
  2  select lvl "ID", dbms_random.string('A',10) "STR" from (select
  3  level "LVL" from dual connect by level <=50000);

Table created.

SQL> create index ind_test2 on test2 (id);

Index created.

Let us generate optimizer statistics based on the policy as the customer used.


SQL> exec dbms_stats.gather_table_stats (user, 'TEST1', cascade=>true, method_opt=>'FOR ALL INDEXED COLUMNS SIZE 75');

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats (user, 'TEST2', cascade=>true, method_opt=>'FOR ALL INDEXED COLUMNS SIZE 75');

PL/SQL procedure successfully completed.

The stats on the table are



SQL> select owner, table_name, partitioned, num_rows, blocks
  2  from dba_tables
  3  where table_name in ('TEST1','TEST2')
  4  and   owner = 'TEST';

OWNER      TABLE_NAME PAR   NUM_ROWS     BLOCKS
---------- ---------- --- ---------- ----------
TEST       TEST1      NO      100000        186

TEST       TEST2      NO       50000        156


break on table_name skip 1
select table_name, column_name,num_distinct, num_nulls, density
from dba_tab_columns
where table_name in ('TEST1','TEST2')
and   owner='TEST'
order by table_name;

TABLE_NAME COLUMN_NAM NUM_DISTINCT  NUM_NULLS    DENSITY
---------- ---------- ------------ ---------- ----------
TEST1      ID                    3          0    .000005
           VAL             

TEST2      ID                50000          0     .00002
           STR

Based on the input to dbms_stats, stats are collected only on indexed columns and not for other non-indexed columns.
The following query is used to check the execution plan.


SQL> set autot trace
SQL> select a.id, a.val, b.str
  2  from    test1 a,
  3          test2 b
  4  where   a.id = b.id
  5  and     a.val = 40;

Elapsed: 00:00:00.02

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=46 Card=667 Bytes=77372)
   1    0   HASH JOIN (Cost=46 Card=667 Bytes=77372)
   2    1     TABLE ACCESS (FULL) OF 'TEST1' (Cost=19 Card=1000 Bytes=16000)
   3    1     TABLE ACCESS (FULL) OF 'TEST2' (Cost=17 Card=50000 Bytes=5000000)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        327  consistent gets
        315  physical reads
          0  redo size
        489  bytes sent via SQL*Net to client
        499  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

Now, let’s delete and gather new statistics on the same tables.


SQL> exec dbms_stats.delete_table_stats (user, 'TEST1');

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.delete_table_stats (user, 'TEST2');

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats (user, 'TEST1', cascade=>true, method_opt=>'FOR ALL COLUMNS');

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats (user, 'TEST2', cascade=>true, method_opt=>'FOR ALL COLUMNS');

PL/SQL procedure successfully completed.

Let us check the stats again.


TABLE_NAME COLUMN_NAM NUM_DISTINCT  NUM_NULLS    DENSITY
---------- ---------- ------------ ---------- ----------
TEST1      ID                    3          0    .000005
           VAL              100000          0     .00001

TEST2      ID                50000          0     .00002
           STR               50000          0     .00002

The plan by the same query is


SQL> select a.id, a.val, b.str
  2  from    test1 a,
  3          test2 b
  4  where   a.id = b.id
  5  and     a.val = 40;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=21 Card=1 Bytes=22)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'TEST2' (Cost=2 Card=1 Bytes=15)
   2    1     NESTED LOOPS (Cost=21 Card=1 Bytes=22)
   3    2       TABLE ACCESS (FULL) OF 'TEST1' (Cost=19 Card=1 Bytes=7)
   4    2       INDEX (RANGE SCAN) OF 'IND_TEST2' (NON-UNIQUE) (Cost=1 Card=1)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        183  consistent gets
        174  physical reads
          0  redo size
        489  bytes sent via SQL*Net to client
        499  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

The comparison between both of the above plans is -

Plan when stats are gathered	Cardinality of TEST1 table	Consistent Gets
Only on indexed columns	1000	327
On all columns	1	183

Why such difference in both the execution plans?

Cardinality plays a very big role in opting an execution plan. In first case, as the stats are not available on VAL column of the table TEST1.

The cardinality when statistics are available only on indexed columns

The filter condition is on ID column of table TEST1 and as there are no stats available on this column, oracle will take the default selectivity as 1/100 and calculated cardinality as


num_rows*selectivity = 100000*(1/100) = 1000

As per the plan, the TEST1 is driving table and cardinality is very high and hence oracle opted the FTS for TEST2 table and cardinality is calculated as


num_rows = 50000

When stats are available on both the all column of the tables, for TEST1 table cardinality is calculated as


Num_rows*(“ID column selectivity” * “VAL column Selectivity)
= round (100000 * (1/3 * 1/100000))
= 0

As cardinality cannot be 0 it will consider as 1.

As the cardinality for TEST1 table is very low i.e. 1, TEST2 will be opted for index scan and hence cardinality is calculated as


Num_rows * (1/distinct) 
= round (50000 * (1/50000)) 
= 1

From above discussion it is concluded that the stats are necessary on all columns. In the above-mentioned test case, we have calculated the stats with histograms but the same result can be found without histograms, as the column stats are getting updated.

To sum up, the stats on all columns are required for optimal execution plan.

Oracle Experiences and information

Sunday, July 27, 2008

Do we need stats on all columns?

No comments:

Blogroll

About Me

Feedburner Subscription

Blog Archive

Visitor Locations - Across Globe