Why my distinct query is not using index?

This question "I am running a distinct query on a column which is having index, but my distinct query is not using the index", was asked by one of my friend. Answer was pretty simple. With this post I am trying to answer it with some examples

Without going in much details lets test this scenario, by creating a table SOME_DATA, adding an INDEX on VAL column and run a DISTINCT query

SQL> create table some_data as
  2  select trunc(dbms_random.value(1,100)) val, a.* from dba_objects a;
Table created.

SQL> desc some_data
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 VAL                                                NUMBER
 OWNER                                              VARCHAR2(30)
 OBJECT_NAME                                        VARCHAR2(128)
 SUBOBJECT_NAME                                     VARCHAR2(30)
 OBJECT_ID                                          NUMBER
 DATA_OBJECT_ID                                     NUMBER
 OBJECT_TYPE                                        VARCHAR2(19)
 CREATED                                            DATE
 LAST_DDL_TIME                                      DATE
 TIMESTAMP                                          VARCHAR2(19)
 STATUS                                             VARCHAR2(7)
 TEMPORARY                                          VARCHAR2(1)
 GENERATED                                          VARCHAR2(1)
 SECONDARY                                          VARCHAR2(1)
 NAMESPACE                                          NUMBER
 EDITION_NAME                                       VARCHAR2(30)

SQL> create index some_data_val_indx on some_data(val);
Index created.

SQL> exec dbms_stats.gather_table_stats(null, 'SOME_DATA', cascade=>true);
PL/SQL procedure successfully completed.

SQL> set autot trace
SQL> select distinct val from some_data;
99 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 443606636

--------------------------------------------------------------------------------
| Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |    99 |   297 |    77   (2)| 00:00:01 |
|   1 |  HASH UNIQUE       |           |    99 |   297 |    77   (2)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| SOME_DATA | 18864 | 56592 |    76   (0)| 00:00:01 |
--------------------------------------------------------------------------------

As we can see that table "SOME_DATA" was FULL TABLE SCANNED to get the results, even if we were having INDEX on the column. Let me try it with INDEX HINT.

SQL> select /*+ index(some_data) */ distinct val from some_data;
99 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 443606636

--------------------------------------------------------------------------------
| Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |    99 |   297 |    77   (2)| 00:00:01 |
|   1 |  HASH UNIQUE       |           |    99 |   297 |    77   (2)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| SOME_DATA | 18864 | 56592 |    76   (0)| 00:00:01 |
--------------------------------------------------------------------------------

As you can see that table "SOME_DATA" was FULL TABLE SCANNED to get the results, even if even if we provided the INDEX HINT. Why? Answer is pretty simple, it is because B*Tree INDEX does not store NULL values. So Oracle can't be sure to find distinct values just by scanning an Index.

So, if my column is NOT NULL, will Oracle use INDEX? Lets see

SQL> alter table some_data modify(val not null);
Table altered.

SQL> SQL> select distinct val from some_data;
99 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1329759908
--------------------------------------------------------------------------------------------
| Id  | Operation             | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                    |    99 |   297 |    48   (8)| 00:00:01 |
|   1 |  SORT UNIQUE NOSORT   |                    |    99 |   297 |    48   (8)| 00:00:01 |
|   2 |   INDEX FULL SCAN     | SOME_DATA_VAL_INDX | 18864 | 56592 |    44   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Now as we can see here that after modifying the column NOT NULL, Oracle was able to perform INDEX FULL SCAN. Actually I was hoping that Oracle would do a INDEX FAST FULL SCAN, it usually do by itself in such cases. Just to add here, if we are sure that all the columns of a table in query can be fetched by index only, prefer INDEX FAST FULL SCAN to have multi-block read. Lets try by adding INDEX_FSS hint.

SQL> select /*+ index_ffs(some_data) */ distinct val from some_data;
99 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1329759908
--------------------------------------------------------------------------------------------
| Id  | Operation             | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                    |    99 |   297 |    13   (8)| 00:00:01 |
|   1 |  HASH UNIQUE          |                    |    99 |   297 |    13   (8)| 00:00:01 |
|   2 |   INDEX FAST FULL SCAN| SOME_DATA_VAL_INDX | 18864 | 56592 |    12   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Great, We can observe huge improvement by reduction of cost when Oracle perfomed INDEX FAST FULL SCAN (multiblock read). Now what if we can not modify our column to NOT NULL and still want to use INDEX, we simply need to add predicate to have NOT NULL in query as

SQL> alter table some_data modify (val null);
Table altered.

SQL> select /*+ index_ffs(some_data) */ distinct val from some_data where val is not null;
99 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1329759908
--------------------------------------------------------------------------------------------
| Id  | Operation             | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                    |    99 |   297 |    13   (8)| 00:00:01 |
|   1 |  HASH UNIQUE          |                    |    99 |   297 |    13   (8)| 00:00:01 |
|*  2 |   INDEX FAST FULL SCAN| SOME_DATA_VAL_INDX | 18864 | 56592 |    12   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Summary:
1) B*Tree INDEX does not store NULLs
2) Distinct query on nullable column can't use INDEX
3) To use INDEX with DISTINCT query on nullable column, add predicate to have NOT NULL values and INDEX HINT if needed
4) Distinct query on NOT NULL column can use INDEX, check plan and add INDEX_FFS HINT if needed
5) If an INDEX have all the columns in query/table, take advantage of multiblock read [INDEX_FFS]

Please do post your feedback.

Related Posts:
- Why Primary Key Foreign Key Relationship and Join Elimination
- SQL Tuning: Partition Pruning and HASH JOIN - Real Life Example
- Tune Complete Refresh of Materialized View by atomic_refresh=>false parameter of dbms_mview.refresh
- Query optimization tips for Oracle
- Avoiding unnecessary function calls to optimize SQL statements
- Why do I prefer COALESCE over NVL
- PL/SQL Tuning: Bulk Collect with Dynamic SQL
- PL/SQL Tuning: Bind Variables and execute immediate
- Reduce database calls by posting Multiple Records from Application to Database

33 comments:

UnknownMay 20, 2016 at 10:22 AM
Hi Nimish, I have one doubt.. here we have not null values in value column then oracle is not using index
does it mean that in order to use index with distinct query column should explicitly be declared as NOT NULL..
it doesnt matter if that column reallly has null value or not..only when the column is declared as not null oracle will use index..

please clarify..

ReplyDelete
Replies
UnknownMay 20, 2016 at 11:58 AM
https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:3193365100346233806
ReplyDelete
Replies
Pinto DasMay 20, 2016 at 12:18 PM
short n crisp..... nice explanation...
ReplyDelete
Replies
AnonymousMay 20, 2016 at 1:03 PM
gr8 explanation..
ReplyDelete
Replies
Rahul KalraMay 20, 2016 at 5:09 PM
I read your post and found to be excellent, thanks for sharing :) . Cheers
ReplyDelete
Replies
UnknownMay 20, 2016 at 7:23 PM
Wonderful Explanation....Thanks
ReplyDelete
Replies
Carlo S. CoccioloMay 21, 2016 at 7:25 AM
B-Tree indexes are not used also when there are too few distinct values (see *_TAB_COLS.Num_Distinct) in the selected column; read Mark Gurry's "SQL Tuning Pocket Reference" (O'Reilly, 2001) for reference.
ReplyDelete
Replies
Raj Kumar KushwahaMay 21, 2016 at 8:42 AM
This one is excellent help in SQL Tuning...thanks much for sharing
ReplyDelete
Replies
Rajlakshmi GhoshMay 21, 2016 at 4:55 PM
Really excellent blog:-)
ReplyDelete
Replies
UnknownMay 22, 2016 at 11:25 PM
Nimish do you know why you had to force FFS, optinizer should pick it. Don't you think so.

Regards,
Virendra
ReplyDelete
Replies
JobgardMay 23, 2016 at 7:24 PM
Great Explanation Nimish
ReplyDelete
Replies
UnknownMay 24, 2016 at 2:37 AM
Hi Nimish. First of all thanks for such demonstration. I was surprised that Oracle did not pick up the Index Fast Full Scan automatically and instead went for Index Full scan. I tested the scenario with similar data as yours on 11.2.0.3 and 12.1.0.2 EE Win 32 and 64 bit and found that in both cases INDEX FFS is chosen. I do not have any system stats gathered either. Then I thought of flouting with my default parameter values and changed optimizer_index_cost_adj to 1 (from default value 100) and executed the query to retrieve the cursor cache and from plan table output. In both cases that was INDEX FULL SCAN with the previous step being SORT UNIQUE NOSORT. That is exactly as yours. Can you check how are the full scan/index driving parameters shaped up in your system?
ReplyDelete
Replies
Arijit SannigrahiMay 24, 2016 at 3:15 PM
what an explanation sir.. Really nice to learn these tuning stuffs.. Will be waiting for your next post
ReplyDelete
Replies
Martin FerrierMay 24, 2016 at 7:57 PM
a good blog entry - this is a common problem in oracle tuning, and explained clearly.
ReplyDelete
Replies
Paulo O D MottaMay 24, 2016 at 7:57 PM
I believe it´s depends upon on how statistics is updated , how many distincts values does the column have , how many rows does the table have , the kind of the index used etc...
ReplyDelete
Replies
sagarMay 25, 2016 at 7:47 AM
Hi Nimish,
Thank you so much for your post.
ReplyDelete
Replies
Tuheen MannaMay 25, 2016 at 7:54 AM
Nice Explanation
ReplyDelete
Replies
good and great thinkingMay 29, 2016 at 8:27 AM
nice yr
ReplyDelete
Replies
Mohit October 19, 2016 at 12:10 PM
Hi Nimish,

Can you post some tutorial for different types of scan methods oracle used
?
ReplyDelete
Replies
AnonymousJuly 2, 2017 at 10:03 AM
Hello! I could have sworn I've visited this website before
but after going through some of the posts I realized it's new to me.

Regardless, I'm definitely pleased I discovered it and I'll be bookmarking it and checking back often!
ReplyDelete
Replies
AnonymousJuly 14, 2017 at 12:25 AM
Moderate phlebitis will often resolve itself.
ReplyDelete
Replies
AnonymousAugust 1, 2017 at 8:58 AM
We're a group of volunteers and starting a new scheme in our community.
Your website offered us with valuable information to work on. You've done a formidable job and our whole community
will be grateful to you.
ReplyDelete
Replies
AnonymousSeptember 2, 2017 at 10:36 PM
If you would like to take a great deal from this article then you have
to apply these strategies to your won web site.
ReplyDelete
Replies
AnonymousOctober 5, 2017 at 1:43 PM
Website load time is an important factor, because Google
is taking the site's loading speed into consideration in determining its ranking.
ReplyDelete
Replies
AnonymousDecember 9, 2017 at 1:36 PM
Spot on with this write-up, I actually feel
this web site needs a great deal more attention. I'll probably be back again to read through more, thanks for the info!
ReplyDelete
Replies
AnonymousFebruary 8, 2018 at 12:32 AM
Have you ever considered about including a little bit
more than just your articles? I mean, what you say is valuable and everything.
But imagine if you added some great photos or videos
to give your posts more, "pop"! Your content is excellent but with images and clips, this blog could undeniably be one of the very best in its field.

Fantastic blog!
ReplyDelete
Replies
AnonymousMay 18, 2018 at 3:45 AM
If some one needs expert view concerning blogging then i recommend him/her to pay a visit this weblog, Keep up the pleasant job.
ReplyDelete
Replies

Add comment