I’m relatively early in SQL learning. I have made some data which has records where the date is missing. I want to capture as an output records that show the missing dates and the associated department name affected.
output of missing dates expected to be like
Date DepartmentID
2023-11-03 3001
2023-11-03 4001
2023-11-06 1001
2023-11-06 2001
2023-11-07 1001
2023-11-07 2001
2023-11-07 4001
2023-11-09 4001
My table setup
Create table Departments (date date,DepartmentID int, Name text);
insert into Departments values
(to_date('1.11.23','DD.MM,YY'),1001,'SRO'),
(to_date('1.11.23','DD.MM,YY'),2001,'Drs'),
(to_date('1.11.23','DD.MM,YY'),3001,'Accounting'),
(to_date('1.11.23','DD.MM,YY'),4001,'ds'),
(to_date('2.11.23','DD.MM,YY'),1001,'SRO'),
(to_date('2.11.23','DD.MM,YY'),2001,'Drs'),
(to_date('2.11.23','DD.MM,YY'),3001,'Accounting'),
(to_date('2.11.23','DD.MM,YY'),4001,'ds'),
(to_date('3.11.23','DD.MM,YY'),1001,'SRO'),
(to_date('3.11.23','DD.MM,YY'),2001,'Drs'),
(to_date('4.11.23','DD.MM,YY'),1001,'SRO'),
(to_date('4.11.23','DD.MM,YY'),2001,'Drs'),
(to_date('4.11.23','DD.MM,YY'),3001,'Accounting'),
(to_date('4.11.23','DD.MM,YY'),4001,'ds'),
(to_date('5.11.23','DD.MM,YY'),1001,'SRO'),
(to_date('5.11.23','DD.MM,YY'),2001,'Drs'),
(to_date('5.11.23','DD.MM,YY'),3001,'Accounting'),
(to_date('5.11.23','DD.MM,YY'),4001,'ds'),
(to_date('6.11.23','DD.MM,YY'),3001,'Accounting'),
(to_date('6.11.23','DD.MM,YY'),4001,'ds'),
(to_date('7.11.23','DD.MM,YY'),3001,'Accounting'),
(to_date('8.11.23','DD.MM,YY'),1001,'SRO'),
(to_date('8.11.23','DD.MM,YY'),2001,'Drs'),
(to_date('8.11.23','DD.MM,YY'),3001,'Accounting'),
(to_date('8.11.23','DD.MM,YY'),4001,'ds'),
(to_date('9.11.23','DD.MM,YY'),1001,'SRO'),
(to_date('9.11.23','DD.MM,YY'),2001,'Drs'),
(to_date('9.11.23','DD.MM,YY'),3001,'Accounting');
So far my logic is the following sql whilst following instructions from here
with lastdate as
(select max(date) as Maxdate from Departments)
select date,
--lead(date),
select maxdate from lastdate OVER(partition by date ORDER BY date) as Next_Date
from Departments
But this gives errors as seen here
ERROR: syntax error at or near "select" LINE 6: select maxdate from
lastdate OVER(partition by date ORDER …
^
This is in postgres. Things I’m confused about
- How do run a CTE and have an OVER statement run along side against the CTE
- Best way to approach this problem.
2
Answers
You want to find departments and dates that are missing from the departments table. So create a table with all dates and departments that you expect and subtract the ones that exist.
Demo: https://dbfiddle.uk/rjCV9s7r
In PostgreSQL, anti-joins are faster than
except
: demo at db<>fiddleIt takes 0.2s. You pick the period you want to zoom in on by setting the
start
andstop
dates ingenerate_series()
. When used in the select list, it’ll spawn all required dates for all possible departments, which you join with your table and pick out the dates and depts that your table didn’t have a match for (Departments is null
).Same using
except
will be a few times slower:To get all dates instead of a specific period, you can generate them based on
min()
andmax()
separately, or just plug uncorrelated scalar subqueries directly where you need these values:In case you want to ignore dates when nobody reported anything (November 3rd and 7th in your fiddle): instead of generating the full calendar, extract all distinct dates that you have and combine them with all distinct depts:
In any case, you might want to aggregate to get one array of all missing departments for each date: