Postgresql - Count the number of half-days `entity_a` happens on

EnzoBaldisserri
May 14, 2023
254 views
0 votes
2 Answers

Context

I have two tables entity_a and entity_b. The entity_b is linked to entity_a through a foreign key, and has another field date, which is a date (not a datetime, not a timestamp). The date field of entity_b indicates when entity_a happens.

Problem

I want to count of number of half-days where entity_a is happening. However to do that, I don’t take into account the time of entity_b (I don’t make the rules, and no I don’t think they make sense either), just the number of entity_b per date: on one day, if there are more than two, count 1 day, if there is 1, count 0.5 day.

In other words, each entity_b counts for 0.5, for a maximum of 1 per date.

As one more constraint: I want to return one row for each entity_a, since I’ll be querying multiple at time

Note: using Postgres

Example

In this example, entity_a_1 has 1.0 day, entity_a_2 has 1.5 days

entity_a_1
- entity_b (2023-05-01)
- entity_b (2023-05-02)
entity_a_2
- entity_b (2023-05-12)
- entity_b (2023-05-13)
- entity_b (2023-05-13)
- entity_b (2023-05-13)

Thing I’ve tried

first, for those who want to test at home: the schema and data of the example

CREATE TABLE entity_a (id SERIAL PRIMARY KEY);
CREATE TABLE entity_b (id SERIAL PRIMARY KEY, a_id INTEGER REFERENCES entity_a(id), date DATE);
INSERT INTO entity_a (id) VALUES (1), (2);
INSERT INTO entity_b (a_id, date) VALUES (1, '2023-05-01'), (1, '2023-05-02'), (2, '2023-05-12'), (2, '2023-05-13'), (2, '2023-05-13'), (2, '2023-05-13');

I have tried things that did the job using a subquery but with very poor performance:

SELECT
  a.id,
  SUM(b_agg.count)
FROM
  entity_a a
  JOIN (
    SELECT
      b.a_id,
      CASE
        WHEN COUNT(*) > 1 THEN 1
        ELSE 0.5
      END AS count
    FROM
      entity_b b
    GROUP BY
      b.a_id,
      b.date
  ) b_agg ON b_agg.a_id = a.id
GROUP BY
  a.id;

(if you use the data from the example, it will obviously be very fast, but consider there are millions of rows in each table)
(here’s a small bash script that will fill the tables with a million lines. It’s far from reality, yet a little closer for i in {3..1000000}; do; psql --command "INSERT INTO entity_a (id) VALUES ($i); INSERT INTO entity_b (a_id, date) VALUES ($i, '2023-01-01')"; done)

And I had the feeling window functions could help, but I don’t know much about them so I always ended up failing to use them correctly 😅

Any help is welcome, thanks in advance !

Tags: postgresql sql

Answers

- SelVazi
- May 12, 2023 at 3:41 pm
- 0 votes
0
I don’t believe window functions will help. If you remove the join because it isn’t required here, the query will be faster:
```
SELECT a_id, SUM(count) as count_
FROM (
  SELECT
      b.a_id,
      CASE
        WHEN COUNT(*) > 1 THEN 1
        ELSE 0.5
      END AS count
    FROM
      entity_b b
    GROUP BY
      b.a_id,
      b.date
) AS t
group by t.a_id;
```
Login or Signup to reply.

- shawnt00
- May 12, 2023 at 5:47 pm
- 0 votes
0
Here’s a solution via window functions. I don’t know if you’ll find that it runs better but it’s worth checking into:
```
with data as (
    select *,
        case when date = lag(date) over (partition by a_id order by date)
            then 0.0
            else case when date = lead(date) over (partition by a_id order by date)
                then 1.0 else 0.5 end
        end as weight
    from entity_b
)
select a.id, sum(weight) as total
from entity_a a inner join entity_b b on b.a_id = a.id
group by a.id;
```
https://dbfiddle.uk/Az4y03-0
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – Count the number of half-days `entity_a` happens on

Answers