I have a table that looks like (never mind the database vendor as I need this in ANSI SQL):
create table edge (
parent_id int not null,
child_id int not null,
value float not null,
start date not null,
end date not null
)
I would then have the following CSV input data:
1,2,0,2023-01-01,2023-01-10
1,2,0,2023-01-11,2023-01-20
1,2,0,2023-01-21,NULL
1,3,0,2023-01-01,2023-01-10
1,3,0,2023-01-11,2023-01-20
1,3,1,2023-01-21,NULL
Note the case where the parent is 1 and the child is 2, the same value 0 repeats across multiple date intervals, so it could be collapsed to:
1,2,0,2023-01-01,NULL
1,3,0,2023-01-01,2023-01-20
1,3,1,2023-01-21,NULL
a value of NULL
in end
means valid until further notice.
Ideally there would be a query that would for table edge (yes graph edges) generate the output of the second excerpt. The goal is to first delete redundant rows and update the remaining last row with start
to match the start
of the first deleted row.
The closest I can get is the following, here I am finding all the redundant rows to delete:
SELECT *
FROM edge
WHERE (parent, child, value, start) IN (SELECT parent, child, value, end+1
FROM edge)
ORDER BY parent, child, start
I would then need to do the update step to leave the time series consistent. On a separate note, value
is being exact matched and it is a float but this works on PostgreSQL at least.
If there is no way using ANSI I would be interested to learn for the PostgreSQL case.
2
Answers
I would first suggest a query that selects the desired rows as per the business rules above that may be used (as a view?) without deleting. I am using the non-ANSI
distinct on
PostgreSQL’s clause.If this is not acceptable then shape the query as a CTE and then delete these rows that are not in it.
Here is a SQL standard-compliant version w/o
distinct on
.SQL Fiddle demo
If the date ranges are continuous I think the easiest (but a bit dirty) approach would be as follows:
Rename table and set
end
to a unique date in the future where it isnull
.Put values in new edge table:
Set
end
tonull
where it matches the unique date you set before and delete old table.