EDIT:
As requested, our table schema is,
posts:
postid
(primary key),post_text
comments:
commentid
(primary key) ,postid
(foreign key referencing posts.postid
),comment_text
replies
replyid
(primary key)commentid
(foreign key referencing comments.commentid)reply_text
I have the tables posts, comments, and replies in a SQL database. (Obviously, a post can have comments, and a comment can have replies)
I want to return a post based on its id, postid
.
So I would like a database function has the inputs and outputs,
input:
postid
output:
post = {
postid
post_text
comments: [comment, ...]
}
Where the comment and reply are nested in the post,
comment = {
commentid,
text
replies: [reply, ...]
}
reply = {
replyid
reply_text
}
I have tried using joins, but the returned data is highly redundant, and it seems stupid. For instance, fetching the data from two different replies will give,
postid | post_text | commentid | comment_text | replyid | reply_text |
---|---|---|---|---|---|
1 | POST_TEXT | 78 | COMMENT_TEXT | 14 | REPLY1_TEXT |
1 | POST_TEXT | 78 | COMMENT_TEXT | 15 | REPLY2_TEXT |
It seems instead I want to make 3 separate queries, in sequence (first to the table posts, then to comments, then to replies)
How do I do this?
3
Answers
The “highly redundant” join result is normally the best way, because it is the natural thing in a relational database. Relational databases aim at avoiding redundancy in data storage, but not in query output. Avoiding that redundancy comes at an extra cost: you have to aggregate the data on the server side, and the client probably has to unpack the nested JSON data again.
Here is some sample code that demonstrates how you could aggregate the results:
The redundant data stems from a cross join of a post’s comments and replies. I.e. for each post you join each comment with each reply. Comment 78 does neither relate to reply 14 nor to reply 15, but merely to the same post.
The typical approach to select the data would hence be three queries:
You can also reduce this to two queries and join the posts table to the comments query, the replies query, or both. This again, will lead to selecting redundant data, but may ease data handling in your app.
If you want to avoid joins, but must avoid database round trips, you can glue query results together:
At last you can create JSON in your DBMS. Again, don’t cross join comments with replies, but join the aggregated comments object and the aggregated replies object to the post.
Your idea to store things in JSON is a good one if you have something to parse it down the line.
As an alternative to the previous answers that involve JSON, you can also get a normal SQL result set (table definition and sample data are below the query):
Note: the
PublicationType
column was added for the sake of clarity. You can alternatively inspectCommentID
andReplyToId
and see what is null to determine the type of publication.This should leave you with very little, if any, redundant data to transfer back to the SQL client.
This approach with
UNION ALL
will work with 3 tables too (you only have to add 1UNION ALL
) but in your case, I would rather go with a 2-table schema:This makes 1 fewer table and allows to have level 2 replies (replies to replies), or more, rather than just replies to comments. A recursive query (there are plenty of samples of that on SO) can make it so a reply can always be linked back to the original comment if you want.
Edit: I noticed your comment just a bit late. Of course, no matter what solution you take, there is no need to execute a request to get the replies to each and every comment.
Even with 3 tables, even without JSON, the query to get all the replies for all the comments at once is: