skip to Main Content

I’ve got two tables:

utilities:

id timestamp action
901 2024-08-11 09:59:25.000 on power
902 2024-08-11 09:59:35.000 on water
903 2024-08-11 09:59:55.000 off power
904 2024-08-11 10:01:25.000 on gas
905 2024-08-11 10:02:35.000 off water
906 2024-08-11 10:11:18.000 off power
907 2024-08-11 10:31:28.000 off gas
908 2024-08-11 11:15:37.000 on power

items:

id timestamp action
906 2024-08-11 09:59:45.000 on lights
907 2024-08-11 09:59:58.000 off lights
908 2024-08-11 10:15:34.000 on tap
909 2024-08-11 10:18:25.000 on heating
910 2024-08-11 10:21:44.000 off heating
911 2024-08-11 11:02:35.000 off tap
912 2024-08-11 12:01:08.000 open door
913 2024-08-11 12:11:28.000 closer door

I’m trying to combine them, without the ids clashing (generate a new id?) and then want to do a date_trunc('hour', timestamp) as time, COUNT(*) as metric to work out how many actions are happening in the table per hour, so the end result would be:

time metric
2024-08-11 09:00:00.000 5
2024-08-11 10:00:00.000 7
2024-08-11 11:00:00.000 2
2024-08-11 12:00:00.000 2

I tried this query, but it complains about:

Utilities.timestamp must appear in the GROUP BY …

WITH one AS (
  SELECT 
  date_trunc('hour', timestamp) as timeOne, 
  COUNT(*) as utilities_count
  FROM Utilities
  ORDER BY timeOne
),
two AS (
  SELECT 
  date_trunc('hour', timestamp) as timeTwo, 
  COUNT(*) as item_count
  FROM Items
  ORDER BY timeTwo
)
SELECT 
  SUM(utilities_count, item_count) as metric,
  timeOne as time
FROM one, two
ORDER BY 1;

Any idea how to get the data combined and counted with the correct hourly binning?

2

Answers


  1. Simply:

    SELECT date_trunc('hour', timestamp) AS time, count(*) AS metric
    FROM  (
       SELECT timestamp FROM utilities
       UNION ALL
       SELECT timestamp FROM items
       ) sub
    GROUP  BY 1
    ORDER  BY 1;
    

    Clashing IDs are not relevant to this query. Simply append rows from both tables with UNION ALL (not just UNION!), use date_trunc() and aggregate.

    I removed a more complex query that would only make sense with (unlikely) expression indexes on underlying tables.

    Login or Signup to reply.
  2. count is an aggregate function which combines the values of multiple rows together. You have to tell Postgres how to group rows with a group by clause. You want to group by hour, so GROUP BY date_trunc('hour', timestamp) or GROUP BY 1.

    Also, you do not want to do a cross join which will result in each row in one being joined with every row in two. Instead you can union both tables together and then group, but this can’t take advantage of indexes. Instead, count both tables (which can use an index), then union and group and sum the counts together (only a handful of rows).

    WITH hours AS (
      SELECT 
      date_trunc('hour', recorded_at) as hour, 
      COUNT(*) as cnt
      FROM utilities
      GROUP BY 1
    
      UNION ALL
    
      SELECT 
      date_trunc('hour', recorded_at) as hour, 
      COUNT(*) as cnt
      FROM items
      GROUP BY 1
    )
    SELECT 
      hour,
      SUM(count) as cnt
    FROM hours
    GROUP BY hour
    ORDER BY hour;
    

    Other notes…

    • Need to order the subqueries, that’ll just slow things down.
    • Avoid naming columns using keywords like timestamp. They risk errors and don’t have a lot of descriptive power. Follow the thing_at for timestamps and thing_on for dates.
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search