skip to Main Content

I’m relatively early in SQL learning. I have made some data which has records where the date is missing. I want to capture as an output records that show the missing dates and the associated department name affected.

output of missing dates expected to be like

Date        DepartmentID
2023-11-03  3001
2023-11-03  4001
2023-11-06  1001
2023-11-06  2001
2023-11-07  1001
2023-11-07  2001
2023-11-07  4001
2023-11-09  4001

My table setup

Create table Departments (date date,DepartmentID int, Name text);

insert into Departments values
(to_date('1.11.23','DD.MM,YY'),1001,'SRO'),
(to_date('1.11.23','DD.MM,YY'),2001,'Drs'),
(to_date('1.11.23','DD.MM,YY'),3001,'Accounting'),
(to_date('1.11.23','DD.MM,YY'),4001,'ds'),
(to_date('2.11.23','DD.MM,YY'),1001,'SRO'),
(to_date('2.11.23','DD.MM,YY'),2001,'Drs'),
(to_date('2.11.23','DD.MM,YY'),3001,'Accounting'),
(to_date('2.11.23','DD.MM,YY'),4001,'ds'),
(to_date('3.11.23','DD.MM,YY'),1001,'SRO'),
(to_date('3.11.23','DD.MM,YY'),2001,'Drs'),
(to_date('4.11.23','DD.MM,YY'),1001,'SRO'),
(to_date('4.11.23','DD.MM,YY'),2001,'Drs'),
(to_date('4.11.23','DD.MM,YY'),3001,'Accounting'),
(to_date('4.11.23','DD.MM,YY'),4001,'ds'),
(to_date('5.11.23','DD.MM,YY'),1001,'SRO'),
(to_date('5.11.23','DD.MM,YY'),2001,'Drs'),
(to_date('5.11.23','DD.MM,YY'),3001,'Accounting'),
(to_date('5.11.23','DD.MM,YY'),4001,'ds'),
(to_date('6.11.23','DD.MM,YY'),3001,'Accounting'),
(to_date('6.11.23','DD.MM,YY'),4001,'ds'),
(to_date('7.11.23','DD.MM,YY'),3001,'Accounting'),
(to_date('8.11.23','DD.MM,YY'),1001,'SRO'),
(to_date('8.11.23','DD.MM,YY'),2001,'Drs'),
(to_date('8.11.23','DD.MM,YY'),3001,'Accounting'),
(to_date('8.11.23','DD.MM,YY'),4001,'ds'),
(to_date('9.11.23','DD.MM,YY'),1001,'SRO'),
(to_date('9.11.23','DD.MM,YY'),2001,'Drs'),
(to_date('9.11.23','DD.MM,YY'),3001,'Accounting');

So far my logic is the following sql whilst following instructions from here

with lastdate as
(select max(date) as Maxdate from Departments)

  select date, 
  --lead(date),
  select maxdate from lastdate OVER(partition by date ORDER BY date) as Next_Date 
  from Departments

But this gives errors as seen here

ERROR: syntax error at or near "select" LINE 6: select maxdate from
lastdate OVER(partition by date ORDER …
^

This is in postgres. Things I’m confused about

  1. How do run a CTE and have an OVER statement run along side against the CTE
  2. Best way to approach this problem.

2

Answers


  1. You want to find departments and dates that are missing from the departments table. So create a table with all dates and departments that you expect and subtract the ones that exist.

    with 
      all_dates (date) as
      (
        select generate_series(min(date), max(date), interval '1 day') from departments
      ),
      all_departments as
      (
        select distinct departmentid from departments
      )
    select all_dates.date, all_departments.departmentid
    from all_dates cross join all_departments
    except
    select date, departmentid
    from departments
    order by date, departmentid;
    

    Demo: https://dbfiddle.uk/rjCV9s7r

    Login or Signup to reply.
  2. In PostgreSQL, anti-joins are faster than except: demo at db<>fiddle

    with required_dept_dates as (
      select distinct DepartmentID, 
             generate_series('2023.11.01'::date,'2023.11.10','1 day')::date AS date
      from Departments)
    select required_dept_dates.*
    from Departments natural right join required_dept_dates
    where Departments is null order by 2,1;
    

    It takes 0.2s. You pick the period you want to zoom in on by setting the start and stop dates in generate_series(). When used in the select list, it’ll spawn all required dates for all possible departments, which you join with your table and pick out the dates and depts that your table didn’t have a match for (Departments is null).

    Same using except will be a few times slower:

    select distinct DepartmentID, 
           generate_series('2023.11.01'::date,'2023.11.10','1 day')::date AS date
    from Departments
    except
    select DepartmentID, date
    from Departments;
    

    To get all dates instead of a specific period, you can generate them based on min() and max() separately, or just plug uncorrelated scalar subqueries directly where you need these values:

    with required_dept_dates as (
      select distinct DepartmentID, 
             generate_series((select min(date) from Departments),
                             (select max(date) from Departments),
                             '1 day')::date AS date
      from Departments)
    select required_dept_dates.*
    from Departments natural right join required_dept_dates
    where Departments is null;
    

    In case you want to ignore dates when nobody reported anything (November 3rd and 7th in your fiddle): instead of generating the full calendar, extract all distinct dates that you have and combine them with all distinct depts:

    with required_dept as (
      select distinct DepartmentID from Departments)
    ,required_dept_date as (
      select distinct date,rd.DepartmentID from Departments, required_dept AS rd)
    select required_dept_date.*
    from Departments natural right join required_dept_date
    where Departments is null;
    

    In any case, you might want to aggregate to get one array of all missing departments for each date:

    with required_dept_date as (
      select distinct DepartmentID, 
             generate_series((select min(date) from Departments),
                             (select max(date) from Departments),
                             '1 day')::date AS date
      from Departments)
    select required_dept_date.date,
           array_agg(required_dept_date.departmentid 
                     order by required_dept_date.departmentid) AS missing
    from Departments natural right join required_dept_date
    where Departments is null
    group by 1 order by 1;
    
    date missing
    2023-11-01 {700,2001,3001,4001,5001,6001}
    2023-11-02 {700,1001,3001,4001,5001,6001}
    2023-11-03 {700,1001,2001,3001,4001,5001,6001}
    2023-11-04 {700,1001,2001,4001,5001,6001}
    2023-11-05 {700,1001,2001,3001,5001,6001}
    2023-11-06 {700,1001,2001,3001,4001,6001}
    2023-11-07 {700,1001,2001,3001,4001,5001,6001}
    2023-11-08 {700,1001,2001,3001,4001,5001}
    2023-11-09 {1001,2001,3001,4001,5001,6001}
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search