Optimizing SQL with MIN and MAX - Real Life Approach

Here I am sharing a question from OTN which was to write a SQL to find the the rows where date value is a quarter-end (3/31, 6/30, 9/30 and 12/31) date from a table of 4.5 million rows. OP already posted many SQLs solving the problem with different tactics so that problem can be solved in the optimized manner. I picked up the best solution (according to me) among them and optimized it little further. Let's see the original SQL and how I optimized it further.

Here is how table, Index and data was generated.

SQL> create table a (nbr number, dt date);
Table created.

SQL> insert into a (nbr, dt)
  2  select level,
  3         trunc(date '2001-01-01' + dbms_random.value(0, date '2017-01-01' - date '2001-01-01'))
  4  from dual
  5  connect by level <= 1500000
  6  union all
  7  select level + 1500000,
  8         trunc(date '2001-01-01' + dbms_random.value(0, date '2017-01-01' - date '2001-01-01'))
  9  from dual
 10  connect by level <= 1500000
 11  union all
 12  select level + 3000000,
 13         trunc(date '2001-01-01' + dbms_random.value(0, date '2017-01-01' - date '2001-01-01'))
 14  from dual
 15  connect by level <= 1500000;
4500000 rows created.

SQL> commit;
Commit complete.

SQL> create index a_indx on a(dt);
Index created.

So here is the query for the solution which OP posted and I think most of the developers would create this one only. Approach of the SQL is very straight forward.
1. Select minimum and maximum date from table
2. Generate all Quarter End Dates from minimum and maximum date
3. Select Data from table where date value is in generated dates in above step

SQL> WITH date_range (min_dt, max_dt) AS (SELECT MIN (dt), MAX (dt) FROM a),
  2       qtr_ends (qe_dt)
  3       AS (    SELECT ADD_MONTHS (TRUNC (min_dt) - 1, 3 * LEVEL)
  4                 FROM date_range
  5           CONNECT BY ADD_MONTHS (TRUNC (min_dt) - 1, 3 * LEVEL) <= max_dt)
  6  SELECT *
  7    FROM a
  8   WHERE dt IN (SELECT qe_dt FROM qtr_ends);

49393 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2111837994
--------------------------------------------------------------------------------------------
| Id  | Operation                       | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |          |   778 | 21784 |   618   (0)| 00:00:01 |
|*  1 |  HASH JOIN RIGHT SEMI           |          |   778 | 21784 |   618   (0)| 00:00:01 |
|   2 |   VIEW                          | VW_NSO_1 |     1 |     6 |   309   (0)| 00:00:01 |
|   3 |    VIEW                         |          |     1 |     6 |   309   (0)| 00:00:01 |
|*  4 |     CONNECT BY WITHOUT FILTERING|          |       |       |            |          |
|   5 |      VIEW                       |          |     1 |    18 |   309   (0)| 00:00:01 |
|   6 |       SORT AGGREGATE            |          |     1 |     9 |            |          |
|   7 |        TABLE ACCESS FULL        | A        |  4549K|    39M|   309   (0)| 00:00:01 |
|   8 |   TABLE ACCESS FULL             | A        |  4549K|    95M|   309   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("DT"=INTERNAL_FUNCTION("QE_DT"))
   4 - filter("MAX_DT">=ADD_MONTHS(TRUNC(INTERNAL_FUNCTION("MIN_DT"))-1,3*LEVEL))

Statistics
----------------------------------------------------------
          7  recursive calls
          0  db block gets
       6460  consistent gets
          0  physical reads
          0  redo size
     773436  bytes sent via SQL*Net to client
       5899  bytes received via SQL*Net from client
        495  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
      49393  rows processed

When I checked the above plan, I noticed two "TABLE ACCESS FULL", but as we know MIN/MAX function can use the INDEX so I tried to rewrite the above query by just changing the "date_range" WITH factored Sub-Query.

SQL> WITH date_range (min_dt, max_dt)
  2       AS (SELECT *
  3             FROM (SELECT MIN (dt) FROM a), (SELECT MAX (dt) FROM a)),
  4       qtr_ends (qe_dt)
  5       AS (    SELECT ADD_MONTHS (TRUNC (min_dt) - 1, 3 * LEVEL)
  6                 FROM date_range
  7           CONNECT BY ADD_MONTHS (TRUNC (min_dt) - 1, 3 * LEVEL) <= max_dt)
  8  SELECT *
  9    FROM a
 10   WHERE dt IN (SELECT qe_dt FROM qtr_ends);

49393 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 771581248

---------------------------------------------------------------------------------------------
| Id  | Operation                        | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |          |   778 | 21784 |   315   (0)| 00:00:01 |
|*  1 |  HASH JOIN RIGHT SEMI            |          |   778 | 21784 |   315   (0)| 00:00:01 |
|   2 |   VIEW                           | VW_NSO_1 |     1 |     6 |     6   (0)| 00:00:01 |
|   3 |    VIEW                          |          |     1 |     6 |     6   (0)| 00:00:01 |
|*  4 |     CONNECT BY WITHOUT FILTERING |          |       |       |            |          |
|   5 |      NESTED LOOPS                |          |     1 |    18 |     6   (0)| 00:00:01 |
|   6 |       VIEW                       |          |     1 |     9 |     3   (0)| 00:00:01 |
|   7 |        SORT AGGREGATE            |          |     1 |     9 |            |          |
|   8 |         INDEX FULL SCAN (MIN/MAX)| A_INDX   |     1 |     9 |     3   (0)| 00:00:01 |
|   9 |       VIEW                       |          |     1 |     9 |     3   (0)| 00:00:01 |
|  10 |        SORT AGGREGATE            |          |     1 |     9 |            |          |
|  11 |         INDEX FULL SCAN (MIN/MAX)| A_INDX   |     1 |     9 |     3   (0)| 00:00:01 |
|  12 |   TABLE ACCESS FULL              | A        |  4549K|    95M|   309   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("DT"=INTERNAL_FUNCTION("QE_DT"))
   4 - filter("from$_subquery$_003"."MAX(DT)">=ADD_MONTHS(TRUNC(INTERNAL_FUNCTION("fr
              om$_subquery$_001"."MIN(DT)"))-1,3*LEVEL))

Statistics
----------------------------------------------------------
         10  recursive calls
          0  db block gets
       3640  consistent gets
          0  physical reads
          0  redo size
     773436  bytes sent via SQL*Net to client
       5899  bytes received via SQL*Net from client
        495  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
      49393  rows processed

As you can see here "TABLE ACCESS FULL" is just done one time and MIN/MAX are retrieved by INDEX FULL SCAN twice. This little tweak has reduced the cost to 315 from 618 and consistent gets are also reduced significantly to 3640 from 6460.

I hope you have enjoyed reading this post and might have learnt something new. Please post your feedback in comment box. You can also Buy me a coffee.

Read More on Tuning SQL
- VIEW PUSHED PREDICATE - A Powerful Optimizer Transformation feature
- Why my distinct query is not using index?
- Avoiding unnecessary function calls to optimize SQL statements
- Why do I prefer COALESCE over NVL
- Query optimization tips for Oracle
Read More on Tuning PL/SQL
- Reduce database calls by posting Multiple Records
- Append String to CLOB in Optimized way
- PLSQL Tuning: Bulk Collect with Dynamic SQL
- PLSQL Tuning: Bind Variables and execute immediate
Read More on Analyzing SQL & PL/SQL
- How to Get Execution Plan and Statistics of SQL Query
- DBMS_PROFILER: Overview and How to Install
- TKPROF: Generating and analyzing trace file