Context
I have two tables entity_a
and entity_b
. The entity_b
is linked to entity_a
through a foreign key, and has another field date
, which is a date
(not a datetime, not a timestamp). The date
field of entity_b
indicates when entity_a
happens.
Problem
I want to count of number of half-days where entity_a
is happening. However to do that, I don’t take into account the time of entity_b
(I don’t make the rules, and no I don’t think they make sense either), just the number of entity_b
per date: on one day, if there are more than two, count 1 day, if there is 1, count 0.5 day.
In other words, each entity_b
counts for 0.5
, for a maximum of 1
per date.
As one more constraint: I want to return one row for each entity_a
, since I’ll be querying multiple at time
Note: using Postgres
Example
In this example, entity_a_1
has 1.0 day, entity_a_2
has 1.5 days
- entity_a_1
- entity_b (2023-05-01)
- entity_b (2023-05-02)
- entity_a_2
- entity_b (2023-05-12)
- entity_b (2023-05-13)
- entity_b (2023-05-13)
- entity_b (2023-05-13)
Thing I’ve tried
first, for those who want to test at home: the schema and data of the example
CREATE TABLE entity_a (id SERIAL PRIMARY KEY);
CREATE TABLE entity_b (id SERIAL PRIMARY KEY, a_id INTEGER REFERENCES entity_a(id), date DATE);
INSERT INTO entity_a (id) VALUES (1), (2);
INSERT INTO entity_b (a_id, date) VALUES (1, '2023-05-01'), (1, '2023-05-02'), (2, '2023-05-12'), (2, '2023-05-13'), (2, '2023-05-13'), (2, '2023-05-13');
I have tried things that did the job using a subquery but with very poor performance:
SELECT
a.id,
SUM(b_agg.count)
FROM
entity_a a
JOIN (
SELECT
b.a_id,
CASE
WHEN COUNT(*) > 1 THEN 1
ELSE 0.5
END AS count
FROM
entity_b b
GROUP BY
b.a_id,
b.date
) b_agg ON b_agg.a_id = a.id
GROUP BY
a.id;
(if you use the data from the example, it will obviously be very fast, but consider there are millions of rows in each table)
(here’s a small bash script that will fill the tables with a million lines. It’s far from reality, yet a little closer for i in {3..1000000}; do; psql --command "INSERT INTO entity_a (id) VALUES ($i); INSERT INTO entity_b (a_id, date) VALUES ($i, '2023-01-01')"; done
)
And I had the feeling window functions could help, but I don’t know much about them so I always ended up failing to use them correctly 😅
Any help is welcome, thanks in advance !
2
Answers
I don’t believe window functions will help. If you remove the
join
because it isn’t required here, the query will be faster:Here’s a solution via window functions. I don’t know if you’ll find that it runs better but it’s worth checking into:
https://dbfiddle.uk/Az4y03-0