sum greater than in subquery - Postgresql

Typewar
January 2, 2023
190 views
0 votes
2 Answers

I’m making a database in PostgreSQL that involves around democracy. All data that should be displayed are controlled by the users, and their percentage of power.

I’m struggling to write this SQL query where a collection of tags on a post should only be shown once the sum of all the percentage for each tag reaches a certain criteria.

The relations between the tables (relevant to this question) looks like this:

The post_tags table is used for deciding what tag stays on what post, decided by the users based on their percentage.

It may look something like this

approved_by	post_id	tag_id	percentage
1	1	1	0.33
5	1	3	0.45
7	1	3	0.25
6	1	3	0.15
4	1	1	0.90
1	1	2	0.45
1	1	6	-0.60
6	1	2	-0.15

How do you write an SQL query that selects a post and its tags if the percentage sum is above a certain threshold?

In the case of SUM(post_tags.percentage) > 0.75, only tag with tag_id 1 and 3 should show.

So far, I have written this SQL query, but it contains duplicates in the array_agg (might be a separate issue), and the HAVING only seem to depend on the total sum of all the tags in the array_agg.

SELECT
    posts.post_id, array_agg(tags.name) AS tags
FROM
    posts, tags, post_tags
WHERE
    post_tags.post_id = posts.post_id AND
    post_tags.tag_id = tags.tag_id
GROUP BY
    posts.post_id
HAVING
    SUM(post_tags.percentage) > 0.75
LIMIT 10;

I assume I might need to do a subquery within the query, but you can’t do SUM inside the WHERE clause. I’m a bit lost at this issue.

Any help is appreciated

UPDATE

Because I think there needs to be atleast 2 queries into play, I think this should be one of them

SELECT
    tags.name
FROM
    post_tags, posts, tags
WHERE
    post_tags.tag_id = tags.tag_id AND
    post_tags.post_id = posts.post_id AND
    posts.post_id = 1
GROUP BY
    tags.tag_id
HAVING
    SUM(post_tags.percentage) > 0.75

In this case, it’s only for post 1, and I don’t know how to continue this query for all posts

Tags: postgresql sql

Answers

Chosen as BEST ANSWER
- Typewar
- January 3, 2023 at 10:21 am
- 0 votes
0
It's easy to get confused, but start small, and then expand the SQL query as you go.

Note the inner parenthesis will execute first. Start with the inner query, and then work on the outer query when building SQL queries.

In this case, for finding the tags relevant for a single post can be written like so
```
SELECT
    t.name
FROM
    post_tags
INNER JOIN tags t ON t.tag_id = post_tags.tag_id
INNER JOIN posts p2 ON p2.post_id = post_tags.post_id AND p2.post_id = 1
GROUP BY
    t.tag_id
HAVING
    SUM(post_tags.percentage) > 0.75
```
To expand on this, and apply the query for every post, switch out the 1 and set it equal the outer scope. The complete SQL query becomes this:
```
SELECT p.post_id, ARRAY(
    SELECT
        t.name
    FROM
        post_tags
    INNER JOIN tags t ON t.tag_id = post_tags.tag_id
    INNER JOIN posts p2 ON p2.post_id = post_tags.post_id AND p2.post_id = p.post_id
    GROUP BY
        t.tag_id
    HAVING
        SUM(post_tags.percentage) > 0.75
) AS tags
FROM posts p
```
Big thanks to Tim Biegeleisen who helped change out the FROM statements to INNER JOIN (tho performance-wise, both are tested equally fast in this case).

(Edit)

One idea would be to first aggregate the total percentage for each post/tag pair in a subquery. The subquery gives you a new join table unique_post_tags (one entry per post and tag, including the total_percentage for each post/tag pair). You can then select from post_tags_unique in the outer query, filtering irrelevant tags in the WHERE clause:

SELECT unique_post_tags.post_id, unique_post_tags.tag_id FROM 
  (
    SELECT post_id, tag_id, sum(percentage) as total_percentage 
    FROM post_tags
    GROUP BY post_id, tag_id
  ) AS unique_post_tags
WHERE unique_post_tag.total_percentage > 0.75

To actually select the tag names per post and group it into an array as you requested, the above query can be extended like this:

SELECT unique_post_tags.post_id, array_agg(t.name) AS tags FROM 
  (
    SELECT post_id, tag_id, sum(percentage) as total_percentage 
    FROM post_tags
    GROUP BY post_id, tag_id
  ) AS unique_post_tags
LEFT JOIN tags t ON t.id = unique_post_tags.tag_id
WHERE unique_post_tags.total_percentage > 0.75
GROUP BY unique_post_tags.post_id

Update

After looking at your answer more closely, I now realize that my idea of reducing the join table to the relevant entries first, can be implemented entirely in the subquery using the GROUP BY/HAVING approach you initially suggested:

SELECT relevant_post_tags.post_id , array_agg(t.name) AS tags FROM
  (
    SELECT post_id, tag_id
    FROM post_tags
    GROUP BY post_id, tag_id
    HAVING SUM(percentage) > 0.75
  ) AS relevant_post_tags
LEFT JOIN tags t ON t.id = relevant_post_tags.tag_id
GROUP BY relevant_post_tags.post_id;

Or written as CTE (for readability):

WITH relevant_post_tags AS (
  SELECT post_id, tag_id
  FROM post_tags
  GROUP BY post_id, tag_id
  HAVING SUM(percentage) > 0.75)
)
SELECT relevant_post_tags.post_id, array_agg(t.name) AS tags
FROM relevant_post_tags
LEFT JOIN tags t ON t.id = relevant_post_tags.tag_id
GROUP BY relevant_post_tags.post_id;

If the 0.75 limit is static, you could also create a relevant_post_tags view in the DB and select from there directly. I did not look at the performance of the above (my guess would be that the query optimizer takes care of it, just note that using CTEs had some pitfalls in earlier Postgres versions).

The approach I came up with is a bit different from what you initially asked for, the result set for the queries above will only contain posts that actually have tags.

If you need to select all posts, you can expand like this:

WITH relevant_post_tags AS (
  SELECT post_id, tag_id
  FROM post_tags
  GROUP BY post_id, tag_id
  HAVING SUM(percentage) > 0.75
)
SELECT p.id, array_remove(array_agg(t.name), NULL) AS tags
FROM posts p
LEFT JOIN relevant_post_tags pt on pt.post_id = p.id
LEFT JOIN tags t ON t.id = pt.tag_id
GROUP BY p.id;

Or closer to your solution:

WITH relevant_post_tags AS (
  SELECT post_id, tag_id
  FROM post_tags
  GROUP BY post_id, tag_id
  HAVING SUM(percentage) > 0.75
)
SELECT p.id, ARRAY(
  SELECT t.name
  FROM relevant_post_tags pt
  JOIN tags t ON t.id = pt.tag_id 
  WHERE pt.post_id = p.id
)
FROM posts p;

Please signup or login to give your own answer.

Click here to cancel reply.

sum greater than in subquery – Postgresql

UPDATE

Answers