skip to Main Content

So I have 4 tables:

  1. post
  2. likes
  3. comments
  4. reads

Each table has the column post_id

I want to create a query, where for each post, i want to know the number of likes, the number of comments, and the number of reads
Finally, i want another column caled score, where i sum up the values in likes, comments and reads

Here is the sql query i have to do that:

SELECT posts.post_id,
       posts.title,
       count(likes.like_id) as likes,
       count(comments.comment_id) as comments,
       post_read_count.count as reads,
       (count(likes.like_id) + count(comments.comment_id) + post_read_count.count) as score
FROM community.posts
LEFT JOIN community.likes
    ON posts.post_id = likes.post_id
LEFT JOIN community.comments
    ON posts.post_id = comments.post_id
LEFT JOIN community.post_read_count
    ON posts.post_id = post_read_count.post_id
WHERE posts.is_deleted = false
GROUP BY posts.post_id, post_read_count.count
ORDER BY posts.post_id DESC
LIMIT 100

but i am unable to get the correct number of comments and the sum of items into score is not working properly

I am on postgres

heres a sample data:

posts:
post_id | title
101       some title
102       hello there
103       good day sir
104       good bye         
105       whats up 


likes:
like_id | post_id
1         101  
2         101
3         101
4         102
5         102
6         104
7         104
8         105
9         105
10        101

comments:
comment_id | post_id
1            103  
2            103
3            103
4            101
5            102
6            104
7            105
8            105
9            105
10           103

post_read_count:
post_id | count
101       12
102       54
103       76
104       23         
105       87 

desired output:

output:
post_id | title         | likes | comments | reads | score
101       some title      4       1          12      17
102       hello there     2       1          54      57
103       good day sir    0       4          76      80
104       good bye        2       1          23      26
105       whats up        2       3          87      92

2

Answers


  1. Because you have one-to-many, I would use two CTEs to get your aggregate counts. You’ll need to coalesce because some counts may be zero, and you cannot add a NULL with an integer.

    Schema (PostgreSQL v13)

    create table posts (
      post_id integer, 
      title varchar(20)
      );
    
    insert into posts values 
    (101, 'some title'),
    (102, 'hello there'),
    (103, 'good day sir'),
    (104, 'good bye'),    
    (105, 'whats up');
    
    create table likes (
      like_id integer, 
      post_id integer
      );
      
    insert into likes values 
    (1, 101),
    (2, 101),
    (3, 101),
    (4, 102),
    (5, 102),
    (6, 104),
    (7, 104),
    (8, 105),
    (9, 105),
    (10, 101);
    
    create table comments (
      comment_id integer, 
      post_id integer
      );
      
    insert into comments values 
    (1, 103),
    (2, 103),
    (3, 103),
    (4, 101),
    (5, 102),
    (6, 104),
    (7, 105),
    (8, 105),
    (9, 105),
    (10, 103);
    
    create table post_read_count (
      post_id integer, 
      pcount integer
      );
      
    insert into post_read_count values 
    (101, 12),
    (102, 54),
    (103, 76),
    (104, 23),      
    (105, 87);
    

    Query #1

    with cte_likes as (
      select post_id, count(*) as total_likes
      from likes
      group by post_id
      ), 
      cte_comments as (
        select post_id, count(*) as total_comments
        from comments
        group by post_id
        )
    select p.post_id, 
      p.title,
      coalesce(l.total_likes, 0) as likes, 
      coalesce(c.total_comments, 0) as comments, 
      coalesce(prc.pcount, 0) as reads, 
      coalesce(l.total_likes, 0) + coalesce(c.total_comments, 0) + coalesce(prc.pcount, 0) as score
    from posts p
    left join cte_likes l
      on p.post_id = l.post_id
    left join cte_comments c
      on p.post_id = c.post_id
    left join post_read_count prc
      on p.post_id = prc.post_id;
    
    post_id title likes comments reads score
    101 some title 4 1 12 17
    102 hello there 2 1 54 57
    103 good day sir 0 4 76 80
    104 good bye 2 1 23 26
    105 whats up 2 3 87 92

    View on DB Fiddle

    Login or Signup to reply.
  2. This trick is the extra joins multiply the results for each additional join. It helps if we think through what the engine does to build the results.

    Let’s say you run this when only have one post, but that post has received 3 likes and 2 comments. After the first join from posts to likes, and before the group by, the result set so far has 3 rows:

    post_id like_id
    1 1
    1 2
    1 3

    Next we join to comments. This join starts with the set we built previous, not from the base posts table. So each of the three rows in the previous join adds it’s own set of comments. We end up with this:

    post_id like_id comment_id
    1 1 1
    1 1 2
    1 2 1
    1 2 2
    1 3 1
    1 3 2

    You can see how each of these builds up more rows exponentially. But we’re not done yet. There’s still a GROUP BY operation to handle. So now we group by post_id… but when we do, we’re grouping the entire set from the previous step, not just posts table. Taking the COUNT(like_id) within this group, anything that’s not null still counts. You count the 1 value two times (once for each like); you count the 2 value three times; and you count the 3 value three times, because they are not null and still there as part of the result set up to that point. It’s similar for the comments: you count the 1 and the 2 comments three times each (once for each like).

    Thankfully, post_read_count looks like it would be 1:1 or 1:0 with the posts table, so it won’t add to the issue, and with that in mind, for this query, most of the issue can be resolved by adding a distinct to the aggregate functions:

    SELECT p.post_id, p.title,
           count(distinct l.like_id) as likes,
           count(distinct c.comment_id) as comments,
           r.count as reads,
           (count(distinct l.like_id) + count(distinct c.comment_id) + r.count) as score
    FROM community.posts p
    LEFT JOIN community.post_read_count r ON r.post_id = p.post_id
    LEFT JOIN community.likes l ON l.post_id = p.post_id
    LEFT JOIN community.comments c ON c.post_id = p.post_id
    WHERE p.is_deleted = false
    GROUP BY p.post_id, p.title, r.count
    ORDER BY p.post_id DESC
    LIMIT 100
    

    See it here, with appreciation to @isolated saving me having to build the DDL script myself:

    https://www.db-fiddle.com/f/oeHccomPBVsskpVa3ftigN/0

    This ought to be significantly more efficient than the CTE-based solution. However, some queries will require you to take more drastic steps than just adding distinct to the aggregate function, either nesting subqueries or using CTEs to compute the aggregate values in isolation, so they don’t compound with rows from other tables. In that situation, understanding how the other answer works will be important.

    Also note how I used table alias mnemonics and included ALL the non-aggregated functions as part of the GROUP BY. Both of those are good practices you should be doing in pretty much every query.

    Finally, a complete side issue here: it seems very strange to weight a read the same as a like or comment.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search