So I have 4 tables:
- post
- likes
- comments
- reads
Each table has the column post_id
I want to create a query, where for each post, i want to know the number of likes, the number of comments, and the number of reads
Finally, i want another column caled score, where i sum up the values in likes, comments and reads
Here is the sql query i have to do that:
SELECT posts.post_id,
posts.title,
count(likes.like_id) as likes,
count(comments.comment_id) as comments,
post_read_count.count as reads,
(count(likes.like_id) + count(comments.comment_id) + post_read_count.count) as score
FROM community.posts
LEFT JOIN community.likes
ON posts.post_id = likes.post_id
LEFT JOIN community.comments
ON posts.post_id = comments.post_id
LEFT JOIN community.post_read_count
ON posts.post_id = post_read_count.post_id
WHERE posts.is_deleted = false
GROUP BY posts.post_id, post_read_count.count
ORDER BY posts.post_id DESC
LIMIT 100
but i am unable to get the correct number of comments and the sum of items into score is not working properly
I am on postgres
heres a sample data:
posts:
post_id | title
101 some title
102 hello there
103 good day sir
104 good bye
105 whats up
likes:
like_id | post_id
1 101
2 101
3 101
4 102
5 102
6 104
7 104
8 105
9 105
10 101
comments:
comment_id | post_id
1 103
2 103
3 103
4 101
5 102
6 104
7 105
8 105
9 105
10 103
post_read_count:
post_id | count
101 12
102 54
103 76
104 23
105 87
desired output:
output:
post_id | title | likes | comments | reads | score
101 some title 4 1 12 17
102 hello there 2 1 54 57
103 good day sir 0 4 76 80
104 good bye 2 1 23 26
105 whats up 2 3 87 92
2
Answers
Because you have one-to-many, I would use two CTEs to get your aggregate counts. You’ll need to coalesce because some counts may be zero, and you cannot add a NULL with an integer.
Schema (PostgreSQL v13)
Query #1
View on DB Fiddle
This trick is the extra joins multiply the results for each additional join. It helps if we think through what the engine does to build the results.
Let’s say you run this when only have one post, but that post has received 3 likes and 2 comments. After the first join from
posts
tolikes
, and before thegroup by
, the result set so far has 3 rows:Next we join to
comments
. This join starts with the set we built previous, not from the baseposts
table. So each of the three rows in the previous join adds it’s own set of comments. We end up with this:You can see how each of these builds up more rows exponentially. But we’re not done yet. There’s still a
GROUP BY
operation to handle. So now we group bypost_id
… but when we do, we’re grouping the entire set from the previous step, not just posts table. Taking theCOUNT(like_id)
within this group, anything that’s not null still counts. You count the1
value two times (once for each like); you count the2
value three times; and you count the3
value three times, because they are not null and still there as part of the result set up to that point. It’s similar for the comments: you count the1
and the2
comments three times each (once for each like).Thankfully,
post_read_count
looks like it would be 1:1 or 1:0 with the posts table, so it won’t add to the issue, and with that in mind, for this query, most of the issue can be resolved by adding adistinct
to the aggregate functions:See it here, with appreciation to @isolated saving me having to build the DDL script myself:
This ought to be significantly more efficient than the CTE-based solution. However, some queries will require you to take more drastic steps than just adding
distinct
to the aggregate function, either nesting subqueries or using CTEs to compute the aggregate values in isolation, so they don’t compound with rows from other tables. In that situation, understanding how the other answer works will be important.Also note how I used table alias mnemonics and included ALL the non-aggregated functions as part of the
GROUP BY
. Both of those are good practices you should be doing in pretty much every query.Finally, a complete side issue here: it seems very strange to weight a read the same as a like or comment.