skip to Main Content

I have two SQL queries that output the same kind of output and have the same grouping and order :

select date_trunc('month', inserted_at)::date as date, count(id) from payment_logs where payment_logs.event_name = 'subscription_created' group by date order by date desc;

enter image description here

select date_trunc('month', inserted_at)::date as date, count(id) from users group by date order by date desc;

enter image description here

I would like to join those two results based on the calculated date field (which is the month), and have a result with 3 columns : date, count_users and count_payment_logs.

How can I achieve that? Thanks.

2

Answers


  1. Something like this

    select plog.date as odata, usr.cntusr, plog.cntlog 
    from (
        select date_trunc('month', inserted_at)::date as date, count(id) cntlog
        from payment_logs 
        where payment_logs.event_name = 'subscription_created' 
        group by date order by date desc
    ) plog
    join (
        select date_trunc('month', inserted_at)::date as date, count(id) cntusr
        from users 
        group by date 
    ) usr on plog.data = usr.data
    order by odata desc
    
    Login or Signup to reply.
  2. Nothing wrong with the accepted answer, but I wanted to show an alternative and add some color. Instead of subqueries, you can also use common table expressions (CTEs) which improve readability but also have some other features as well. Here is an example using CTEs:

    with payments as (
      select
        date_trunc('month', inserted_at)::date as date,
        count(id) as payment_count
      from payment_logs
      where
        event_name = 'subscription_created'
      group by date
    ),
    users as (
      select
        date_trunc('month', inserted_at)::date as date,
        count(id) as user_count
      from users
      group by date
    )
    select
      p.date, p.payment_count, u.user_count
    from
      payments p
      join users u on
        p.date = u.date
    order by
      p.date desc
    

    In my opinion the abstraction is neater and makes the code much easier to follow (and thus maintain).

    Other notes:

    The order by is expensive, and you can avoid it within each of the subqueries/CTEs since it’s being done at the end anyway. The ones in the subqueries will be clobbered by whatever you do in the main query anyway, so just omit them completely. Your results will not differ, and your query will be more efficient.

    In this example, you probably don’t have any missing months, but it’s possible… especially if you expand this concept to future queries. In such a case, you may want to consider a full outer join instead of an inner join (you have months that appear in the users that may not be in the payments or vice versa):

    select
      coalesce (p.date, u.date) as date,
      p.payment_count, u.user_count
    from
      payments p
      full outer join users u on
        p.date = u.date
    order by
      1 desc
    

    Another benefit of CTEs vs subqueries is that you can reuse them. In this example, I want to mimic the full outer join concept but with one additional twist — I have data from another table by month that I want in the query. The CTE lets me use the CTE for "payments" and "users" as many times as I want. Here I use them in the all_dates CTE and again in the main query. By creating "all_dates" I can now use left joins and avoid weird coalescing in joins (not wrong, just ugly).

    with payments as (
      -- same as above
    ),
    users as (
      -- same as above
    ),
    all_dates as (
      select date from payments  -- referred to payments here
      union
      select date from users
    )
    select
      a.date, ac.days_in_month, p.payment_count, u.user_count
    from
      all_dates a
      join accounting_calendar ac on
        a.date = ac.accounting_month
      left join payments p on -- referred to it here again, same CTE
        a.date = p.date
      left join users u on
        a.date = u.date
    order by
      p.date desc
    

    The point is you can reuse the CTEs.

    A final advantage is that you can declare the CTE materialized or non-materialized (default). The materialized CTE will essentially pre-process and store the results, which in certain cases may have better performance. A non-materialized on, on the other hand, will mimic a standard subquery which is nice because you can pass where clause conditions from outside the query to inside of it.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search