Why my distinct query is not using index?

This question "I am running a distinct query on a column which is having index, but my distinct query is not using the index", was asked by one of my friend. Answer was pretty simple. With this post I am trying to answer it with some examples

Without going in much details lets test this scenario, by creating a table SOME_DATA, adding an INDEX on VAL column and run a DISTINCT query

SQL> create table some_data as
  2  select trunc(dbms_random.value(1,100)) val, a.* from dba_objects a;
Table created.

SQL> desc some_data
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 VAL                                                NUMBER
 OWNER                                              VARCHAR2(30)
 OBJECT_NAME                                        VARCHAR2(128)
 SUBOBJECT_NAME                                     VARCHAR2(30)
 OBJECT_ID                                          NUMBER
 DATA_OBJECT_ID                                     NUMBER
 OBJECT_TYPE                                        VARCHAR2(19)
 CREATED                                            DATE
 LAST_DDL_TIME                                      DATE
 TIMESTAMP                                          VARCHAR2(19)
 STATUS                                             VARCHAR2(7)
 TEMPORARY                                          VARCHAR2(1)
 GENERATED                                          VARCHAR2(1)
 SECONDARY                                          VARCHAR2(1)
 NAMESPACE                                          NUMBER
 EDITION_NAME                                       VARCHAR2(30)

SQL> create index some_data_val_indx on some_data(val);
Index created.

SQL> exec dbms_stats.gather_table_stats(null, 'SOME_DATA', cascade=>true);
PL/SQL procedure successfully completed.

SQL> set autot trace
SQL> select distinct val from some_data;
99 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 443606636

--------------------------------------------------------------------------------
| Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |    99 |   297 |    77   (2)| 00:00:01 |
|   1 |  HASH UNIQUE       |           |    99 |   297 |    77   (2)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| SOME_DATA | 18864 | 56592 |    76   (0)| 00:00:01 |
--------------------------------------------------------------------------------


As we can see that table "SOME_DATA" was FULL TABLE SCANNED to get the results, even if we were having INDEX on the column. Let me try it with INDEX HINT.
SQL> select /*+ index(some_data) */ distinct val from some_data;
99 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 443606636

--------------------------------------------------------------------------------
| Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |    99 |   297 |    77   (2)| 00:00:01 |
|   1 |  HASH UNIQUE       |           |    99 |   297 |    77   (2)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| SOME_DATA | 18864 | 56592 |    76   (0)| 00:00:01 |
--------------------------------------------------------------------------------

As you can see that table "SOME_DATA" was FULL TABLE SCANNED to get the results, even if even if we provided the INDEX HINT. Why? Answer is pretty simple, it is because B*Tree INDEX does not store NULL values. So Oracle can't be sure to find distinct values just by scanning an Index.

So, if my column is NOT NULL, will Oracle use INDEX? Lets see

SQL> alter table some_data modify(val not null);
Table altered.

SQL> SQL> select distinct val from some_data;
99 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1329759908
--------------------------------------------------------------------------------------------
| Id  | Operation             | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                    |    99 |   297 |    48   (8)| 00:00:01 |
|   1 |  SORT UNIQUE NOSORT   |                    |    99 |   297 |    48   (8)| 00:00:01 |
|   2 |   INDEX FULL SCAN     | SOME_DATA_VAL_INDX | 18864 | 56592 |    44   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Now as we can see here that after modifying the column NOT NULL, Oracle was able to perform INDEX FULL SCAN. Actually I was hoping that Oracle would do a INDEX FAST FULL SCAN, it usually do by itself in such cases. Just to add here, if we are sure that all the columns of a table in query can be fetched by index only, prefer INDEX FAST FULL SCAN to have multi-block read. Lets try by adding INDEX_FSS hint.

SQL> select /*+ index_ffs(some_data) */ distinct val from some_data;
99 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1329759908
--------------------------------------------------------------------------------------------
| Id  | Operation             | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                    |    99 |   297 |    13   (8)| 00:00:01 |
|   1 |  HASH UNIQUE          |                    |    99 |   297 |    13   (8)| 00:00:01 |
|   2 |   INDEX FAST FULL SCAN| SOME_DATA_VAL_INDX | 18864 | 56592 |    12   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Great, We can observe huge improvement by reduction of cost when Oracle perfomed INDEX FAST FULL SCAN (multiblock read). Now what if we can not modify our column to NOT NULL and still want to use INDEX, we simply need to add predicate to have NOT NULL in query as

SQL> alter table some_data modify (val null);
Table altered.

SQL> select /*+ index_ffs(some_data) */ distinct val from some_data where val is not null;
99 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1329759908
--------------------------------------------------------------------------------------------
| Id  | Operation             | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                    |    99 |   297 |    13   (8)| 00:00:01 |
|   1 |  HASH UNIQUE          |                    |    99 |   297 |    13   (8)| 00:00:01 |
|*  2 |   INDEX FAST FULL SCAN| SOME_DATA_VAL_INDX | 18864 | 56592 |    12   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Summary:
1) B*Tree INDEX does not store NULLs
2) Distinct query on nullable column can't use INDEX
3) To use INDEX with DISTINCT query on nullable column, add predicate to have NOT NULL values and INDEX HINT if needed
4) Distinct query on NOT NULL column can use INDEX, check plan and add INDEX_FFS HINT if needed
5) If an INDEX have all the columns in query/table, take advantage of multiblock read [INDEX_FFS]

Please do post your feedback.

Related Posts:
- SQL Tuning: Partition Pruning and HASH JOIN - Real Life Example
- Tune Complete Refresh of Materialized View by atomic_refresh=>false parameter of dbms_mview.refresh
- Query optimization tips for Oracle
- Avoiding unnecessary function calls to optimize SQL statements
- Why do I prefer COALESCE over NVL
- PL/SQL Tuning: Bulk Collect with Dynamic SQL
- PL/SQL Tuning: Bind Variables and execute immediate
- Reduce database calls by posting Multiple Records from Application to Database

25 comments:

  1. Hi Nimish, I have one doubt.. here we have not null values in value column then oracle is not using index
    does it mean that in order to use index with distinct query column should explicitly be declared as NOT NULL..
    it doesnt matter if that column reallly has null value or not..only when the column is declared as not null oracle will use index..

    please clarify..

    ReplyDelete
    Replies
    1. even if your column does not contain NULL, Oracle will do FTS. Workaround is to add where your_column is not null clause in your query

      Delete
    2. thanks nimish for quick update on this!!!

      Delete
  2. https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:3193365100346233806

    ReplyDelete
  3. short n crisp..... nice explanation...

    ReplyDelete
  4. gr8 explanation..

    ReplyDelete
  5. I read your post and found to be excellent, thanks for sharing :) . Cheers

    ReplyDelete
  6. Carlo S. CoccioloMay 21, 2016 at 7:25 AM

    B-Tree indexes are not used also when there are too few distinct values (see *_TAB_COLS.Num_Distinct) in the selected column; read Mark Gurry's "SQL Tuning Pocket Reference" (O'Reilly, 2001) for reference.

    ReplyDelete
  7. Raj Kumar KushwahaMay 21, 2016 at 8:42 AM

    This one is excellent help in SQL Tuning...thanks much for sharing

    ReplyDelete
  8. Rajlakshmi GhoshMay 21, 2016 at 4:55 PM

    Really excellent blog:-)

    ReplyDelete
  9. Nimish do you know why you had to force FFS, optinizer should pick it. Don't you think so.

    Regards,
    Virendra

    ReplyDelete
    Replies
    1. Yes I was also hoping that Oracle will do INDEX_FFS. And after this post I gathered system stats, afterwards it is doing INDEX_FFS itself

      Delete
  10. Hi Nimish. First of all thanks for such demonstration. I was surprised that Oracle did not pick up the Index Fast Full Scan automatically and instead went for Index Full scan. I tested the scenario with similar data as yours on 11.2.0.3 and 12.1.0.2 EE Win 32 and 64 bit and found that in both cases INDEX FFS is chosen. I do not have any system stats gathered either. Then I thought of flouting with my default parameter values and changed optimizer_index_cost_adj to 1 (from default value 100) and executed the query to retrieve the cursor cache and from plan table output. In both cases that was INDEX FULL SCAN with the previous step being SORT UNIQUE NOSORT. That is exactly as yours. Can you check how are the full scan/index driving parameters shaped up in your system?

    ReplyDelete
    Replies
    1. I executed all the things on this post in just one flow on my Oracle 11g XE. All parameters were set to default

      Delete
    2. I tried the same on 11g XE (11.2.0.2). Everything default on a Win32 bit system with the same INDEX FFS result. The only difference could be the version and number of rows.

      Delete
  11. Arijit SannigrahiMay 24, 2016 at 3:15 PM

    what an explanation sir.. Really nice to learn these tuning stuffs.. Will be waiting for your next post

    ReplyDelete
  12. a good blog entry - this is a common problem in oracle tuning, and explained clearly.

    ReplyDelete
  13. I believe it´s depends upon on how statistics is updated , how many distincts values does the column have , how many rows does the table have , the kind of the index used etc...

    ReplyDelete
  14. Hi Nimish,
    Thank you so much for your post.

    ReplyDelete
  15. Hi Nimish,

    Can you post some tutorial for different types of scan methods oracle used
    ?

    ReplyDelete
    Replies
    1. Yeah that would be really helpful to start to learn tuning.

      Delete