skip to Main Content

My table consists of user_id, revenue, publish_month columns.
Right now I use group_by user_id and sum(revenue) to get revenue for all individual users.

Is there a single SQL query I can use to query for user revenue across a time period conditionally? If for a specific user, there is a row for this month, I want to query for this month, last month and the month before. If there is not yet a row for this month, I want to query for last month and the two months before.

Any advice with which approach to take would be helpful. If I should be using cases, if-elses with exists or if this is do-able with a single SQL query?

UPDATE—since I did a bad job of describing the question, I’ve come to include some example data and expected results

Where current month is not present for user 33

Where current month is present

2

Answers


  1. You can use a windowing function to "number" the months. In this way the most recent one will have a value of 1, the prior 2, and the one before 3. Then you can only select the items with a number of 3 or less.

    Here is how:

    SELECT user_id, revienue, publish_month, 
       ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY publish_month DESC) as RN
    FROM yourtableyoudidnotname
    

    now you just select the items with RN less than 3 and do your sum

    SELECT user_id, SUM(revenue) as s_revenue
    FROM (
      SELECT user_id, revenue, publish_month, 
         ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY publish_month DESC) as RN
      FROM yourtableyoudidnotname
    ) X 
    WHERE RN <= 3
    GROUP BY user_id
    

    You could also do this without a sub query if you use the windowing function for SUM and a range, but I think this is easier to understand.


    From the comment — there could be an issue if you have months from more than one year. To solve this make the biggest number in the order by always the most recent. so instead of

     ORDER BY publish_month DESC 
    

    you would have

     ORDER BY (100*publish_year)+publish_month DESC
    

    This means more recent years will always have a higher number so january of 2023 will be 202301 while december of 2022 will be 202212. Since january is a bigger number it will get a row number of 1 and december will get a row number of 2.

    Login or Signup to reply.
  2. Assuming publish_month is a DATE datatype, this should get the most recent three months of data per user…

    SELECT
      user_id, SUM(revenue) as s_revenue
    FROM
    (
      SELECT
        user_id, revenue, publish_month, 
        MAX(publish_month) OVER (PARTITION BY user_id)  AS user_latest_publish_month
      FROM
        yourtableyoudidnotname
    )
      summarised
    WHERE
      publish_month >= DATEADD(month, -2, user_latest_publish_month)
    GROUP BY
      user_id
    

    If you want to limit that to the most recent 3 months out of the last 4 calendar months, just add AND publish_month >= DATEADD(month, -3, DATE_TRUNC(month, GETDATE()))

    • The ambiguity here is why it is important to include a Minimal Reproducible Example
    • With input data and require results, we could test our code against your requirements

    If you’re using strings for the publish_month, you shouldn’t be, and should fix that with utmost urgency.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search