skip to Main Content

I spend a lot of time solving this problem. So, i have table users:

at_timestamp users_count event_type
100000 2 1
100001 4 1
100003 5 0
100005 1 1
100006 3 1
100008 2 0
100008 1 1

etc.

So, event type 1 means IN, and 0 means OUT. I need to find timestamp, when the sum of users was the higest. In this example it was 100001, because 2 + 4 = 6. Then 5 are out and 1 left. Then 1 user in, 3 users are in, which means total users count at the 100006 ts was 5. Then in the same time, 2 leave and 1 in and so on. I am have tried to use window functions, but without IF/ELSE, or something, but it doesn’t work for me. I got something close to, but not exactly what i need.

Example of my postgres query

SELECT at_timestamp, sum(users_count) OVER (PARTITION BY event_type ORDER BY at_timestamp)
FROM users_events;

Then i have tried

SELECT at_timestamp, sum(users_count) OVER (partition by event_type ORDER BY at_timestamp) - LAG(users_count) OVER (PARTITION BY event_type) FROM users_events;

But it doesn’t work, because i need to subtract previous value, if it is out event. So, i need SQL query which will calculate it right 😅
I will be very grateful for your help!

3

Answers


  1. with tallied as (
        select *,
            sum(user_count
                  * case when event_type = 1
                         then 1 else -1 end) over (
                order by at_timestamp) as tally
        from T
    ), ranked as (
        select *,
            dense_rank() over (
                order by tally desc) as dr
        from tallied
    ) 
    select * from ranked where dr = 1;
    

    Because of the potential for ties in the ranking, this uses dense_rank() to filter for maximal output rows.

    Login or Signup to reply.
  2. Short and concise: a CASE WHEN multiplier as from @shawnt00 in a running sum expression using a window clause. I use a named window because I find it more readable and PostgreSQL allows it.

    And finally order by the running sum descending and applying a LIMIT 1:

    marco ~/1/Vertica/supp $ psql -af rsum.sql
    WITH
    -- Your in data ...
    indata(at_timestamp,users_count,event_type) AS (
              SELECT 100000,2,1
    UNION ALL SELECT 100001,4,1
    UNION ALL SELECT 100003,5,0
    UNION ALL SELECT 100005,1,1
    UNION ALL SELECT 100006,3,1
    UNION ALL SELECT 100008,2,0
    UNION ALL SELECT 100008,1,1
    )
    -- real query starts here ...
    SELECT 
      *
    , SUM(users_count * CASE event_type WHEN 1 THEN 1 ELSE -1 END) OVER w AS runsum
    FROM indata
    WINDOW w AS (ORDER BY at_timestamp)
    ORDER BY 4 DESC LIMIT 1;
     at_timestamp | users_count | event_type | runsum 
    --------------+-------------+------------+--------
           100001 |           4 |          1 |      6
    
    Login or Signup to reply.
  3. You can do this arithmetically with your initial approach, if you get rid of the partition by: demo

    SELECT at_timestamp, 
           sum(users_count*(2*event_type-1)) OVER (ORDER BY at_timestamp)
    FROM users_events
    ORDER BY at_timestamp;
    

    That’s because

    • event_type seems to be an integer
    • 2 * 0 - 1 is -1 and
    • 2 * 1 - 1 is 1

    Using this as a multiplier, event_type of 0 causes subtraction of corresponding users_count from the stepping sum, and event_type of 1 results in addition of the users_count

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search