So, I am trying to create a query that counts the numbers of Redirects per week per Site. I made the below query which gives me almost the correct results, but if there are no Redirects, there are no count results (meaning no 0, no None, just no result). I have tried putting the coalese outside the count, but same result.
How can I make this query give me None or 0 when there are no counts?
date_series = db.func.generate_series(min_date , todays_date, timedelta(weeks=1))
trunc_date = db.func.date_trunc('week', date_series)
subquery = db.session.query(trunc_date.label('week')).subquery()
query = db.session.query(subquery.c.week, Site, db.func.count(db.func.coalesce(Redirect.id, 0)))
.outerjoin(Site, subquery.c.week == db.func.date_trunc('week', Redirect.timestamp))
.outerjoin(Site, Redirect.site_id == Site.id)
.group_by(subquery.c.week, Site.id)
counts = query.all()
Edit:
So I watered down my query even more, removing Site from the grouping, swapped ‘week’ with ‘day’, and instead trying to get desired result for only one Site. However, it still doesn’t work and it gives me results as if I was doing a join, and not an outerjoin. I really don’t understand why it won’t give me 0 count results.
See updated query:
query = db.session.query(subquery.c.day, db.func.count(Redirect.id))
.outerjoin(Redirect, subquery.c.day == db.func.date(Redirect.timestamp))
.filter(Redirect.site_id == 82)
.group_by(subquery.c.day)
It still gives me results for only the days where there are values:
0:(datetime.datetime(2022, 11, 23, 0, 0), 1)
1:(datetime.datetime(2023, 12, 7, 0, 0), 1)
And the generated SQL:
SELECT anon_1.day AS anon_1_day, count(redirect.id) AS count_1
FROM (SELECT date_trunc(%(date_trunc_1)s, generate_series(%(generate_series_1)s, %(generate_series_2)s, %(generate_series_3)s)) AS day) AS anon_1 LEFT OUTER JOIN redirect ON anon_1.day = date(redirect.timestamp)
WHERE redirect.site_id = %(site_id_1)s GROUP BY anon_1.day
2
Answers
I believe I figured it out.
The initial subquery is equivalent to the one I showed in my question, except I changed week to day.
Now looks like this:
Then I had to do a second subquery to generate the crossproduct between the dates in the first subquery and Site, meaning all possible combinations of the two. I was not able to make the cross product in the initial subquery as it kept complaining that it couldn't make the join.
So here is the cross product query:
Then I could use this to join in the count of Redirects by outerjoining on both date and site_id simultanously.
The query looks like this:
This now gives me counts for each day, and each Site, with 0's where no Redirect records.
Your current query is using outerjoin, which is correct for ensuring that all dates in your date_series are included, even if there are no corresponding Redirect records. However, the way you’re counting the Redirect.id might need a slight adjustment.
Instead, you should count Redirect.id directly, and the count() function will return 0 if there are no non-null Redirect.id values (which is the case when there’s no matching Redirect record due to the outerjoin).
Modify your query like this:
This query should now correctly count the number of Redirect.id for each week and site, returning 0 when there are no matching Redirect records.