skip to Main Content

My goal

I’m trying to track and display a user’s daily streak posting to my app, but struggling to write a query that works reliably and returns an accurate count.

Some context

My app has a prompt and a post table. A user can submit one post for each prompt (the prompts are created daily, so one post per day, per user).

The prompt table, simplified, looks like this:

id dateKey text
1 20240101 This is an example prompt.
2 20240102 Here is a second prompt.

The post table, simplified, is something like:

id content promptId authorId
50 This is my response to the prompt. 1 90
51 A second response to the same prompt. 1 91

Current query

I’ve tried a few different approaches to queries (using PARTITION BY, dense_rank() etc.) but could only get the user’s longest streak. The query I had if you’re interested:

select distinct on (p."authorId") count(distinct "dateKey"::date) as "streak"
from (select p.*,
      dense_rank() over (partition by p."authorId" order by "dateKey"::date) as seq
      from post p
      join prompt pt on p."promptId" = pt.id
     ) p
join prompt pt on p."promptId" = pt.id
where p."authorId" = 90
group by p."authorId", "dateKey"::date - seq * interval '1 day'
order by p."authorId", streak desc

This appears to work with the below data, but if you add a new "missed" prompt (which should reset the streak), this query would still return 2 (I think I understand why, but not sure how to correct it).

What I need

I basically need something to start at the latest prompt, and go down the list until it finds a prompt with no post from that user.

For example, this joined data would have a streak of 2:

id dateKey text postContent authorId
1 20240104 This is an example prompt. This is my response. 90
2 20240103 Here is a second prompt. A second response. 90
3 20240102 A third prompt. null null
4 20240101 My fourth prompt. A third response, but I missed a day. 90

Functionality most important, but would be nice if it were performant too (prompt might have 1000 rows, post might have millions, and the streak might reach 1000).

I’m a bit lost with postgres/sql’s capabilities for things like this, so hopefully there’s a simple solution out there!

Fiddle: https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/11431

2

Answers


  1. This seems like island and gap problem and gap problem.

    I have created query for single authorid (p.authorId = 90) as asked in the question. You can remove the join condition to get the data for all the authorIds.

    Solution is to use the window functions as follows:

    select authorId, max(sm) from
    (select t.*, 
            sum(case when is null then 1 end ) over (partition by p.authorId order by "dateKey"::date) as sm 
      from (select pt.*, p.*, 
                   lag(p.promptId) over (partition by p.authorId order by "dateKey"::date) as prev_promptId
              from prompt pt
              left join post p on p.promptId = pt.id and p.authorId = 90) t ) t
    group by authorId;
    
    Login or Signup to reply.
  2. For a single table with natural value progression there are simpler solutions. But for the combination of two tables with a (seemingly) arbitrary next promptId I expect a recursive CTE to perform best:

    WITH RECURSIVE cte AS (
       SELECT CURRENT_DATE AS the_day, p."authorId" AS author_id
       FROM   promt pt
       JOIN   post  p ON p."promptId" = pt.id
       WHERE  pt."dateKey" = CURRENT_DATE
       AND    p."authorId" = 90  -- your author here!
       
       UNION ALL
       SELECT c.the_day - 1, p."authorId"   -- assuming no gaps in prompt!
       FROM   cte   c
       JOIN   promt pt ON pt."dateKey" = c.the_day - 1
       JOIN   post  p  ON p."promptId" = pt.id
       WHERE  p."authorId" = c.author_id
       )
    SELECT count(*)
    FROM   cte;
    

    Absolutely requires index support to be fast.
    Ideally, one index on prompt("dateKey", id), and one on post("authorId", "promptId").

    Assuming …

    • … we query for one given user,
    • … we start "today",
    • … no gaps in prompt – exactly one entry per day,
    • prompt."dateKey" is type date (as it should be).

    Related:

    DB design

    If there is at most one prompt per day, consider using the date (data type date!) as PK in table prompt and FK in table post. Allows for much simpler queries.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search