Fill Gaps in Sparse Data - Partitioned Outer Join

There are times when we need to densify the sparse data i.e. fill the gaps in a table. For example we have a SALES table which has all the product and date wise records of sales, but it don't contain the the records that had no sales on a given date. Now the requirement is to have a date wise list of all products with total count of products and amount, if product was not sold on a date, it should come but total count & total amount of the product for that date should be zero.

Reference: Oracle Documentation

Lets first create data to have better understanding

create table product
(
  id      number(10),
  name    varchar2(100)
);

create table sales
(
  pid           number(5),
  sdate         date,
  samount       number(10,2)
);

insert into product values (1, 'Mouse');
insert into product values (2, 'Keyboard');
insert into product values (3, 'Monitor');
insert into product values (4, 'Speaker');
insert into product values (5, 'CPU');

insert into sales values(1, to_date('01-Jan-2014','dd-mon-yyyy'), 100);
insert into sales values(2, to_date('01-Jan-2014','dd-mon-yyyy'), 500);
insert into sales values(4, to_date('01-Jan-2014','dd-mon-yyyy'), 1200);
insert into sales values(5, to_date('01-Jan-2014','dd-mon-yyyy'), 10000);
insert into sales values(4, to_date('01-Jan-2014','dd-mon-yyyy'), 1200);
insert into sales values(1, to_date('02-Jan-2014','dd-mon-yyyy'), 100);
insert into sales values(3, to_date('02-Jan-2014','dd-mon-yyyy'), 8000);
insert into sales values(4, to_date('02-Jan-2014','dd-mon-yyyy'), 1200);
insert into sales values(5, to_date('02-Jan-2014','dd-mon-yyyy'), 10000);
insert into sales values(1, to_date('02-Jan-2014','dd-mon-yyyy'), 100);
insert into sales values(1, to_date('03-Jan-2014','dd-mon-yyyy'), 100);
insert into sales values(2, to_date('03-Jan-2014','dd-mon-yyyy'), 500);
insert into sales values(3, to_date('03-Jan-2014','dd-mon-yyyy'), 8000);
insert into sales values(3, to_date('03-Jan-2014','dd-mon-yyyy'), 8000);
insert into sales values(4, to_date('03-Jan-2014','dd-mon-yyyy'), 1200);
insert into sales values(3, to_date('04-Jan-2014','dd-mon-yyyy'), 8000);
insert into sales values(4, to_date('04-Jan-2014','dd-mon-yyyy'), 1200);
insert into sales values(1, to_date('04-Jan-2014','dd-mon-yyyy'), 100);
insert into sales values(2, to_date('04-Jan-2014','dd-mon-yyyy'), 500);
insert into sales values(3, to_date('04-Jan-2014','dd-mon-yyyy'), 8000);
insert into sales values(4, to_date('04-Jan-2014','dd-mon-yyyy'), 1200);
insert into sales values(2, to_date('05-Jan-2014','dd-mon-yyyy'), 500);
insert into sales values(3, to_date('05-Jan-2014','dd-mon-yyyy'), 8000);
insert into sales values(2, to_date('05-Jan-2014','dd-mon-yyyy'), 500);
insert into sales values(3, to_date('05-Jan-2014','dd-mon-yyyy'), 8000);
insert into sales values(4, to_date('05-Jan-2014','dd-mon-yyyy'), 1200);


select
  name,
  sdate,
  sum(samount) 
from
  product
  left outer join 
  sales
  on (sales.pid = product.id)
group by
  name,
  sdate
order by sdate, name

NAME                           SDATE     SUM(SAMOUNT)
------------------------------ --------- ------------
CPU                            01-JAN-14        10000
Keyboard                       01-JAN-14          500
Mouse                          01-JAN-14          100
Speaker                        01-JAN-14         2400
CPU                            02-JAN-14        10000
Monitor                        02-JAN-14         8000
Mouse                          02-JAN-14          200
Speaker                        02-JAN-14         1200
Keyboard                       03-JAN-14          500
Monitor                        03-JAN-14        16000
Mouse                          03-JAN-14          100
Speaker                        03-JAN-14         1200
Keyboard                       04-JAN-14          500
Monitor                        04-JAN-14        16000
Mouse                          04-JAN-14          100
Speaker                        04-JAN-14         2400
Keyboard                       05-JAN-14         1000
Monitor                        05-JAN-14        16000
Speaker                        05-JAN-14         1200

As you can see in above output, we have date wise data for all the products, but some of the product and sale date combinations are missing as there was no records for them in sales tables. To overcome the problem of sparsity, you can use a partition outer join to fill the gaps for this dimension.

select
  name,
  sdate,
  sum(nvl(samount,0)) 
from
  product
  left outer join 
  sales
  PARTITION BY (SDATE)
  on (sales.pid = product.id)
group by
  name,
  sdate
order by sdate, name

NAME                         SDATE     SUM(NVL(SAMOUNT,0))
---------------------------- --------- -------------------
CPU                          01-JAN-14               10000
Keyboard                     01-JAN-14                 500
Monitor                      01-JAN-14                   0
Mouse                        01-JAN-14                 100
Speaker                      01-JAN-14                2400
CPU                          02-JAN-14               10000
Keyboard                     02-JAN-14                   0
Monitor                      02-JAN-14                8000
Mouse                        02-JAN-14                 200
Speaker                      02-JAN-14                1200
CPU                          03-JAN-14                   0
Keyboard                     03-JAN-14                 500
Monitor                      03-JAN-14               16000
Mouse                        03-JAN-14                 100
Speaker                      03-JAN-14                1200
CPU                          04-JAN-14                   0
Keyboard                     04-JAN-14                 500
Monitor                      04-JAN-14               16000
Mouse                        04-JAN-14                 100
Speaker                      04-JAN-14                2400
CPU                          05-JAN-14                   0
Keyboard                     05-JAN-14                1000
Monitor                      05-JAN-14               16000
Mouse                        05-JAN-14                   0
Speaker                      05-JAN-14                1200


This looks like a cross join query, but it is much more efficient here because the SALES table is selected only once. The SDATE column is selected even when there is no matching row in SALES.

ANSI syntax for SQL in ORACLE is recommended by Oracle. It supports various new features like outer joining with more than one table and as we see Partitioned Outer Joins, but ANSI syntax also comes with several limitations, major one is that it cannot be used in fast refreshable materialized views.



Related Posts:
- FIRST_VALUE and LAST_VALUE with Windowing Clause
- Analytic Functions: Rank, Dense_Rank, Row_Number
- LAG and LEAD - Analytic Function
- Getting Cumulative Sum (Running Total) Using Analytical Functions
- New Features for Developers in Oracle 11g
- Oracle Database 12c New Features for Developers

5 comments:

  1. Good use of ansi sql

    ReplyDelete
  2. Beneficial tips & steps! I've gone through the functions and detailed process. Really its awesome & very beneficial for essential individuals!

    Thanks for sharing such a detailed and informative post! Oracle DBA Services

    ReplyDelete
  3. I've never seen this option used, thanks!

    This may be stating the obvious, but this would still not report anything if there were NO SALES at all for a given day. That's probably never going to happen in this sales example, but...

    ReplyDelete
    Replies
    1. may be then you have to use a table with all the dates !!!

      Delete
  4. can u explain how this query is executing??

    ReplyDelete