skip to Main Content

EDIT:

As requested, our table schema is,

posts:

  • postid (primary key),
  • post_text

comments:

  • commentid (primary key) ,
  • postid (foreign key referencing posts.postid),
  • comment_text

replies

  • replyid (primary key)
  • commentid (foreign key referencing comments.commentid)
  • reply_text

I have the tables posts, comments, and replies in a SQL database. (Obviously, a post can have comments, and a comment can have replies)

I want to return a post based on its id, postid.

So I would like a database function has the inputs and outputs,

input:

postid

output:

post = {
    postid
    post_text
    
    comments: [comment, ...]
}

Where the comment and reply are nested in the post,

comment = {
    commentid, 
    text
    
    replies: [reply, ...]

}
reply = {
    replyid
    reply_text
}

I have tried using joins, but the returned data is highly redundant, and it seems stupid. For instance, fetching the data from two different replies will give,

postid post_text commentid comment_text replyid reply_text
1 POST_TEXT 78 COMMENT_TEXT 14 REPLY1_TEXT
1 POST_TEXT 78 COMMENT_TEXT 15 REPLY2_TEXT

It seems instead I want to make 3 separate queries, in sequence (first to the table posts, then to comments, then to replies)

How do I do this?


3

Answers


  1. The “highly redundant” join result is normally the best way, because it is the natural thing in a relational database. Relational databases aim at avoiding redundancy in data storage, but not in query output. Avoiding that redundancy comes at an extra cost: you have to aggregate the data on the server side, and the client probably has to unpack the nested JSON data again.

    Here is some sample code that demonstrates how you could aggregate the results:

    SELECT postid, post_text,
           jsonb_agg(
              jsonb_build_object(
                 'commentid', commentid,
                 'comment_text', comment_text,
                 'replies', replies
              )
           ) AS comments
    FROM (SELECT postid, post_text, commentid, comment_text,
                 jsonb_agg(
                    jsonb_build_object(
                       'replyid', replyid,
                       'reply_text', reply_text
                    )
                 ) AS replies
          FROM /* your join */
          GROUP BY postid, post_text, commentid, comment_text) AS q
    GROUP BY postid, post_text;
    
    Login or Signup to reply.
  2. The redundant data stems from a cross join of a post’s comments and replies. I.e. for each post you join each comment with each reply. Comment 78 does neither relate to reply 14 nor to reply 15, but merely to the same post.

    The typical approach to select the data would hence be three queries:

    select * from posts;
    select * from comments;
    select * from replies;
    

    You can also reduce this to two queries and join the posts table to the comments query, the replies query, or both. This again, will lead to selecting redundant data, but may ease data handling in your app.

    If you want to avoid joins, but must avoid database round trips, you can glue query results together:

    select *
    from
    (
      select postid as id, postid, 'post' as type, post_text as text from posts
        union all
      select commentid as id, postid, 'comment' as type, comment_text as text from comments
        union all
      select replyid as id, postid, 'reply' as type, reply_text as text from replies
    ) glued
    order by postid, type, id;
    

    At last you can create JSON in your DBMS. Again, don’t cross join comments with replies, but join the aggregated comments object and the aggregated replies object to the post.

    select p.postid, p.post_text, c.comments, r.replies
    from posts p
    left join
    (
      select
        postid,
        jsonb_object_agg(jsonb_build_object('commentid', commentid, 
                                            'comment_text', comment_text)
                        ) as comments
      from comments
      group by postid
    ) c on c.postid = p.postid
    left join
    (
      select
        postid,
        jsonb_object_agg(jsonb_build_object('replyid', replyid, 
                                            'reply_text', reply_text)
                        ) as replies
      from replies
      group by postid
    ) r on r.postid = p.postid;
    
    Login or Signup to reply.
  3. Your idea to store things in JSON is a good one if you have something to parse it down the line.

    As an alternative to the previous answers that involve JSON, you can also get a normal SQL result set (table definition and sample data are below the query):

    WITH MyFilter(postid) AS (
    VALUES (1),(2) /* rest of your filter */
    )
    SELECT 'Post' AS PublicationType, postid, NULL AS CommentID, NULL As ReplyToID, post_text
    FROM Posts
    WHERE postID IN (SELECT postid from MyFilter)
    UNION ALL
    SELECT CASE ReplyToID WHEN NULL THEN 'Comment' ELSE 'Reply' END, postid, commentid, replyToID, comment_text
    FROM Comments
    WHERE postid IN (SELECT postid from MyFilter)
    ORDER BY postid, CommentID NULLS FIRST, ReplyToID NULLS FIRST
    

    Note: the PublicationType column was added for the sake of clarity. You can alternatively inspect CommentID and ReplyToId and see what is null to determine the type of publication.

    This should leave you with very little, if any, redundant data to transfer back to the SQL client.


    This approach with UNION ALL will work with 3 tables too (you only have to add 1 UNION ALL) but in your case, I would rather go with a 2-table schema:

    CREATE TABLE posts (
    postid SERIAL primary key,
    post_text text NOT NULL
    );
    CREATE TABLE comments (
    commentid SERIAL primary key,
    ReplyToID INTEGER NULL REFERENCES Comments(CommentID) /* ON DELETE CASCADE? */,
    postid INTEGER NOT NULL references posts(postid) /* ON DELETE CASCADE? */,
    comment_text Text NOT NULL
    );
    
    
    INSERT INTO posts(post_text) VALUES ('Post 1'),('Post 2'),('Post 3');
    INSERT INTO Comments(postid, comment_text) VALUES (1, 'Comment 1.1'), (1, 'Comment 1.2'), (2, 'Comment 2.1');
    INSERT INTO Comments(replytoId, postid, comment_text) VALUES (1, 1, 'Reply to comment 1.1'), (3, 2, 'Reply to comment 2.1');
    

    This makes 1 fewer table and allows to have level 2 replies (replies to replies), or more, rather than just replies to comments. A recursive query (there are plenty of samples of that on SO) can make it so a reply can always be linked back to the original comment if you want.


    Edit: I noticed your comment just a bit late. Of course, no matter what solution you take, there is no need to execute a request to get the replies to each and every comment.

    Even with 3 tables, even without JSON, the query to get all the replies for all the comments at once is:

    SELECT *
    FROM replies
    WHERE commentid IN (
        SELECT commentid
        FROM comments
        WHERE postid IN (
            /* List your post ids here or nest another SELECT postid FROM posts WHERE ... */
        )
    )
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search