My goal
I’m trying to track and display a user’s daily streak posting to my app, but struggling to write a query that works reliably and returns an accurate count.
Some context
My app has a prompt
and a post
table. A user can submit one post for each prompt (the prompts are created daily, so one post per day, per user).
The prompt
table, simplified, looks like this:
id | dateKey | text |
---|---|---|
1 | 20240101 | This is an example prompt. |
2 | 20240102 | Here is a second prompt. |
The post
table, simplified, is something like:
id | content | promptId | authorId |
---|---|---|---|
50 | This is my response to the prompt. | 1 | 90 |
51 | A second response to the same prompt. | 1 | 91 |
Current query
I’ve tried a few different approaches to queries (using PARTITION BY
, dense_rank()
etc.) but could only get the user’s longest streak. The query I had if you’re interested:
select distinct on (p."authorId") count(distinct "dateKey"::date) as "streak"
from (select p.*,
dense_rank() over (partition by p."authorId" order by "dateKey"::date) as seq
from post p
join prompt pt on p."promptId" = pt.id
) p
join prompt pt on p."promptId" = pt.id
where p."authorId" = 90
group by p."authorId", "dateKey"::date - seq * interval '1 day'
order by p."authorId", streak desc
This appears to work with the below data, but if you add a new "missed" prompt (which should reset the streak), this query would still return 2 (I think I understand why, but not sure how to correct it).
What I need
I basically need something to start at the latest prompt, and go down the list until it finds a prompt with no post from that user.
For example, this joined data would have a streak of 2:
id | dateKey | text | postContent | authorId |
---|---|---|---|---|
1 | 20240104 | This is an example prompt. | This is my response. | 90 |
2 | 20240103 | Here is a second prompt. | A second response. | 90 |
3 | 20240102 | A third prompt. | null | null |
4 | 20240101 | My fourth prompt. | A third response, but I missed a day. | 90 |
Functionality most important, but would be nice if it were performant too (prompt
might have 1000 rows, post
might have millions, and the streak
might reach 1000).
I’m a bit lost with postgres/sql’s capabilities for things like this, so hopefully there’s a simple solution out there!
Fiddle: https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/11431
2
Answers
This seems like island and gap problem and gap problem.
I have created query for single
authorid
(p.authorId = 90
) as asked in the question. You can remove the join condition to get the data for all theauthorIds
.Solution is to use the window functions as follows:
For a single table with natural value progression there are simpler solutions. But for the combination of two tables with a (seemingly) arbitrary next
promptId
I expect a recursive CTE to perform best:Absolutely requires index support to be fast.
Ideally, one index on
prompt("dateKey", id)
, and one onpost("authorId", "promptId")
.Assuming …
prompt."dateKey"
is typedate
(as it should be).Related:
DB design
If there is at most one prompt per day, consider using the date (data type
date
!) as PK in tableprompt
and FK in tablepost
. Allows for much simpler queries.