skip to Main Content

I have three tables:

  • forums with columns id, label, name
  • discussions with columns id, name, creation_date, open, forum_id, user_id
  • messages with columns id, content, date, updated_date, discussion_id, user_id

I want to get each forum and for each forum get the number of discussions and the number of messages. I currently have the following code:

SELECT 
    "Forum"."id", "Forum"."label", "Forum"."name",    
    COUNT("Discussions"."id") AS "nbDiscussions", 
    COUNT("Discussions->Messages"."id") AS "nbMessages"
FROM 
    "forums" AS "Forum"
LEFT OUTER JOIN 
    "discussions" AS "Discussions" ON "Forum"."id" = "Discussions"."forum_id"
LEFT OUTER JOIN 
    "messages" AS "Discussions->Messages" ON "Discussions"."id" = "Discussions->Messages"."discussion_id"
GROUP BY 
    "Forum"."id"
ORDER BY 
    "Forum"."id";

It almost works except that nbDiscussions have the same value as nbMessages. Why are they the same?

On the result that I have, it looks like nbDiscussions took the value of nbMessages.

2

Answers


  1. You may correct the counting problem in nbDiscussions by instead counting:

    COUNT(DISTINCT "Discussions"."id") AS "nbDiscussions"
    

    Full query:

    SELECT 
        "Forum"."id", "Forum"."label", "Forum"."name",    
        COUNT(DISTINCT "Discussions"."id") AS "nbDiscussions", 
        COUNT("Discussions->Messages"."id") AS "nbMessages"
    FROM "forums" AS "Forum"
    LEFT JOIN "discussions" AS "Discussions"
        ON "Forum"."id" = "Discussions"."forum_id"
    LEFT JOIN "messages" AS "Discussions->Messages"
        ON "Discussions"."id" = "Discussions->Messages"."discussion_id"
    GROUP BY 
        "Forum"."id"
    ORDER BY 
        "Forum"."id";
    

    The point to realize here is that the final join from discussions to messages will result in a final table where a given discussion id may be appear across multiple records, one for each message. Using DISTINCT avoids the over counting.

    Login or Signup to reply.
  2. While aggregating all (or most) rows, it is substantially faster to aggregate tables on the "many"-side first and join later. It rules out wrong (multiplied) counts, and columns from the primary table do not have to be aggregated at all (after those have not been multiplied to begin with):

    SELECT f.id, f.label, f.name
         , COALESCE(d.ct_d, 0) AS ct_discussions  -- ①
         , COALESCE(d.ct_m, 0) AS ct_messages
    FROM   forums f
    LEFT   JOIN (
       SELECT d.forum_id
            , count(*) AS ct_d
            , sum(m.ct_m) AS ct_m
       FROM   discussions d
       LEFT   JOIN (
          SELECT m.discussion_id
               , count(*) AS ct_m
          FROM   messages m
          GROUP  BY 1
          ) m ON m.discussion_id = d.id
       GROUP  BY 1
       ) d ON d.forum_id = f.id
    ORDER  BY f.id;
    

    ① Add COALESCE only if there can be forums without discussions or discussions without messages, and you want to display 0 instead of null.

    See:

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search