How can I calculate a user's daily streak in postgresql?

JayAdra
January 4, 2024
235 views
0 votes
2 Answers

My goal

I’m trying to track and display a user’s daily streak posting to my app, but struggling to write a query that works reliably and returns an accurate count.

Some context

My app has a prompt and a post table. A user can submit one post for each prompt (the prompts are created daily, so one post per day, per user).

The prompt table, simplified, looks like this:

id	dateKey	text
1	20240101	This is an example prompt.
2	20240102	Here is a second prompt.

The post table, simplified, is something like:

id	content	promptId	authorId
50	This is my response to the prompt.	1	90
51	A second response to the same prompt.	1	91

Current query

I’ve tried a few different approaches to queries (using PARTITION BY, dense_rank() etc.) but could only get the user’s longest streak. The query I had if you’re interested:

select distinct on (p."authorId") count(distinct "dateKey"::date) as "streak"
from (select p.*,
      dense_rank() over (partition by p."authorId" order by "dateKey"::date) as seq
      from post p
      join prompt pt on p."promptId" = pt.id
     ) p
join prompt pt on p."promptId" = pt.id
where p."authorId" = 90
group by p."authorId", "dateKey"::date - seq * interval '1 day'
order by p."authorId", streak desc

This appears to work with the below data, but if you add a new "missed" prompt (which should reset the streak), this query would still return 2 (I think I understand why, but not sure how to correct it).

What I need

I basically need something to start at the latest prompt, and go down the list until it finds a prompt with no post from that user.

For example, this joined data would have a streak of 2:

id	dateKey	text	postContent	authorId
1	20240104	This is an example prompt.	This is my response.	90
2	20240103	Here is a second prompt.	A second response.	90
3	20240102	A third prompt.	null	null
4	20240101	My fourth prompt.	A third response, but I missed a day.	90

Functionality most important, but would be nice if it were performant too (prompt might have 1000 rows, post might have millions, and the streak might reach 1000).

I’m a bit lost with postgres/sql’s capabilities for things like this, so hopefully there’s a simple solution out there!

Fiddle: https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/11431

Answers

- ORA01017
- January 4, 2024 at 8:02 am
- 0 votes
0
This seems like island and gap problem and gap problem.

I have created query for single authorid (p.authorId = 90) as asked in the question. You can remove the join condition to get the data for all the authorIds.

Solution is to use the window functions as follows:
```
select authorId, max(sm) from
(select t.*, 
        sum(case when is null then 1 end ) over (partition by p.authorId order by "dateKey"::date) as sm 
  from (select pt.*, p.*, 
               lag(p.promptId) over (partition by p.authorId order by "dateKey"::date) as prev_promptId
          from prompt pt
          left join post p on p.promptId = pt.id and p.authorId = 90) t ) t
group by authorId;
```
Login or Signup to reply.

- ErwinBrandstetter
- January 4, 2024 at 10:29 am
- 0 votes
0
For a single table with natural value progression there are simpler solutions. But for the combination of two tables with a (seemingly) arbitrary next promptId I expect a recursive CTE to perform best:
```
WITH RECURSIVE cte AS (
   SELECT CURRENT_DATE AS the_day, p."authorId" AS author_id
   FROM   promt pt
   JOIN   post  p ON p."promptId" = pt.id
   WHERE  pt."dateKey" = CURRENT_DATE
   AND    p."authorId" = 90  -- your author here!
   
   UNION ALL
   SELECT c.the_day - 1, p."authorId"   -- assuming no gaps in prompt!
   FROM   cte   c
   JOIN   promt pt ON pt."dateKey" = c.the_day - 1
   JOIN   post  p  ON p."promptId" = pt.id
   WHERE  p."authorId" = c.author_id
   )
SELECT count(*)
FROM   cte;
```
Absolutely requires index support to be fast.
Ideally, one index on prompt("dateKey", id), and one on post("authorId", "promptId").

Assuming …
- … we query for one given user,
- … we start "today",
- … no gaps in prompt – exactly one entry per day,
- … prompt."dateKey" is type date (as it should be).
Related:
- WITH RECURSIVE query to choose the longest paths
DB design

If there is at most one prompt per day, consider using the date (data type date!) as PK in table prompt and FK in table post. Allows for much simpler queries.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.