skip to Main Content

Edited to clarify and simplify

I have the following query

select 
  Signups.user_id,
  Count(timeoutTable.user_id) as timeoutCount,
  Count(confirmedTable.user_id) as confirmedCount
from Signups
left join (
  select * from Confirmations where action = 'timeout'
) as timeoutTable on signups.user_id = timeoutTable.user_id
left join (
  select * from Confirmations where action = 'confirmed'
) as confirmedTable on signups.user_id = confirmedTable.user_id
group by 
  Signups.user_id,
  timeoutTable.user_id,
  confirmedTable.user_id

When run with the following input

Signups Table

| user_id | time_stamp          |
| ------- | ------------------- |
| 15      | 2020-07-31 18:26:35 |
| 16      | 2021-05-20 01:38:09 |
| 7       | 2020-08-02 08:45:14 |
| 10      | 2020-06-24 17:13:14 |
| 5       | 2020-06-27 17:59:29 |
| 9       | 2021-11-08 03:05:14 |
| 8       | 2021-12-13 03:38:58 |
| 12      | 2020-09-16 11:17:39 |


Confirmations Table

| user_id | time_stamp          | action    |
| ------- | ------------------- | --------- |
| 7       | 2020-03-31 13:11:43 | timeout   |
| 7       | 2021-03-25 07:40:25 | timeout   |
| 8       | 2020-07-27 19:43:25 | confirmed |
| 8       | 2021-03-07 19:48:06 | timeout   |
| 7       | 2020-01-24 15:43:47 | confirmed |

It outputs:

| user_id | timeoutCount | confirmedCount |
| ------- | ------------ | -------------- |
| 15      | 0            | 0              |
| 16      | 0            | 0              |
| 7       | 2            | 2              |
| 10      | 0            | 0              |
| 5       | 0            | 0              |
| 9       | 0            | 0              |
| 8       | 1            | 1              |
| 12      | 0            | 0              |

I am trying to make it so that user_id 7 has a confirmed of 1 and a timeout of 2 but it sets both values to 2 for some reason. Any help would be greatly appreciated.

2

Answers


  1. Your problem is that you are joining all timeouts to all confirmations. So, for a signup with 2 timeouts and 3 confirmations, you’ll get 2 x 3 = 6 rows that you then aggregate.

    When aggregating more than one table, aggregate before joining:

    select 
      user_id, 
      round(case when total_messages = 0 then 0 else total_confirmed / total_messages end, 2) as confirmation_rate
    from
    (
      select 
        s.user_id, 
        t.total_timeouts,
        c.total_confirmed,
        coalesce(t.timeoutCount,0) + coalesce(c.total_confirmed, 0) as total_messages
      from signups s
      left join 
      (
        select user_id, count(*) as total_timeouts
        from confirmations
        where action = 'timeout'
        group by user_id
      ) as t on t.user_id = s.user_id
      left join 
      (
        select user_id, count(*) as total_confirmed
        from confirmations
        where action = 'confirmed'
        group by user_id
      ) as c on c.user_id = s.user_id
    )
    order by user_id;
    

    An alternative to this is conditional aggregation here:

    select 
      user_id, 
      round(case when total_messages = 0 then 0 else total_confirmed / total_messages end, 2) as confirmation_rate
    from
    (
      select 
        s.user_id, 
        sum(c.action = 'timeout') as total_timeouts,
        sum(c.action = 'confirmed') as total_confirmed,
        sum(c.action = 'timeout') - sum(c.action = 'confirmed') as total_messages
      from signups s
      left join confirmations c on c.user_id = s.user_id
      group by s.user_id
    )
    order by user_id;
    
    Login or Signup to reply.
  2. To see what’s happening, look at the subquery without the group by.

    select 
      Signups.user_id,
      timeoutTable.*,
      confirmedTable.*
    from Signups
    left join (
      select * from Confirmations where action = 'timeout'
    ) as timeoutTable on Signups.user_id = timeoutTable.user_id
    left join (
      select * from Confirmations where action = 'confirmed'
    ) as confirmedTable on Signups.user_id = confirmedTable.user_id
    
                timeoutTable                                    confirmedTable
    user_id     user_id     time_stamp              action      user_id     time_stamp              action
    15          null        null                    null        null        null                    null
    16          null        null                    null        null        null                    null
    7           7           2021-03-25 07:40:25     timeout     7           2020-01-24 15:43:47     confirmed
    7           7           2020-03-31 13:11:43     timeout     7           2020-01-24 15:43:47     confirmed
    10          null        null                    null        null        null                    null
    5           null        null                    null        null        null                    null
    9           null        null                    null        null        null                    null
    8           8           2021-03-07 19:48:06     timeout     8           2020-07-27 19:43:25     confirmed
    12          null        null                    null        null        null                    null
    

    Note how user_id 7 has 2 rows and each have a timeout and a confirmation. These rows are counted as both a timeout and a confirmation. This is because you’re joining with two tables at the same time.

    If we instead do the aggregations first and then join, the counts work out.

    select 
      Signups.user_id,
      timeoutTable.*,
      confirmedTable.*
    from Signups
    left join (
      select user_id, count(user_id) as timeouts
      from Confirmations
      where action = 'timeout'
      group by user_id
    ) as timeoutTable on Signups.user_id = timeoutTable.user_id
    left join (
      select user_id, count(user_id) as confirmations
      from Confirmations
      where action = 'confirmed'
      group by user_id
    ) as confirmedTable on Signups.user_id = confirmedTable.user_id
    
                timeoutTable            confirmedTable
    user_id     user_id     timeouts    user_id     confirmations
    15          null        null        null        null
    16          null        null        null        null
    7           7           2           7           1
    10          null        null        null        null
    5           null        null        null        null
    9           null        null        null        null
    8           8           1           8           1
    12          null        null        null        null
    

    Demonstration


    Note:

    left join (
      select * from Confirmations where action = 'timeout'
    ) as timeoutTable on Signups.user_id = timeoutTable.user_id
    

    is better written as without the subquery.

    left join Confirmations as timeoutTable
      on Signups.user_id = timeoutTable.user_id
     and where action = 'timeout'
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search