skip to Main Content

I wrote two queries and thought they would give the same result, but it didn’t happen. Could you please explain to me why?
1)

SELECT date, COUNT(DISTINCT user_id) 
FROM (SELECT user_id, date(time) as date, COUNT(order_id)
                        FROM user_actions
                        WHERE order_id NOT IN (SELECT order_id FROM user_actions WHERE action = 'cancel_order')
                        GROUP BY user_id,date(time)
                        HAVING COUNT(order_id) = 1)t1
GROUP BY date

2)

SELECT date(time) as date, COUNT(DISTINCT user_id)
FROM user_actions
WHERE user_id IN (SELECT user_id FROM (SELECT user_id, date(time), COUNT(order_id)
                        FROM user_actions
                        WHERE order_id NOT IN (SELECT order_id FROM user_actions WHERE action = 'cancel_order')
                        GROUP BY user_id,date(time)
                        HAVING COUNT(order_id) = 1)t1)
GROUP BY date(time) 

2

Answers


  1. In 1), the subquery t1 is retrieving all the users that had one (and only one) action per day of type other than ‘cancel_action’ and the day of this action, then the top query is counting how many of these users existed each day.

    In 2), the subquery t1 is doing exactly the same as in 1), but then you are getting the user_ids and counting users per day of action. The problem is, even if the users exist in the set of users that had at most one action per day at some date, that doesn’t mean that some of these users could have had more than 1 action on the same day in different dates.

    Example:

    user_id = 1

    • 1 action on 02/01/2024
    • 1 action on 02/06/2024

    user_id = 2

    • 1 action on 02/01/2024
    • 1 action on 02/04/2024
    • 2 actions on 02/05/2024

    t1 returns:

    user_id date count
    1 02/01/2024 1
    1 02/06/2024 1
    2 02/01/2024 1
    2 02/04/2024 1

    Then, in 1) counting from t1 by date results in:

    date count
    02/01/2024 2
    02/04/2024 1
    02/06/2024 1

    But in 2) you will get that extra date 02/05/2024 when user_id 2 made two actions, so you will get:

    date count
    02/01/2024 2
    02/04/2024 1
    02/05/2024 1
    02/06/2024 1
    Login or Signup to reply.
  2. it’s a matter of filtering.
    Your first query will only consider users with one order not cancelled per date.
    Then count the distinct users per date.

    Your second query will retrieve all users with one order not cancelled per date.
    Then for each of these users count the distinct users per date.

    So, imagine the following:

    ID User Date Order Action
    1 User2 Date1 OrderID1 action1
    2 User1 Date2 OrderID2 cancel_order
    3 User1 Date1 OrderID3 action1

    Your first query will return

    Date Count
    Date1 2

    Record #2 is excluded.

    Your second query will return

    Date Count
    Date1 2
    Date2 1

    As User1 has a legit order on Date1, the order on Date2 is also counted. Record #2 is no more filtered out.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search