My goal
To calculate the sum of components sold as part of their parent assemblies.
I’m sure this must be a common use case, but I haven’t yet found documentation that leads to the result I’m looking for.
Background
I’m running Postgres 11 on CentOS 7.
I have some tables like as follows:
CREATE TABLE the_schema.names_categories (
id INTEGER NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
created_at TIMESTAMPTZ DEFAULT now(),
thing_name TEXT NOT NULL,
thing_category TEXT NOT NULL
);
CREATE TABLE the_schema.relator (
id INTEGER NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
created_at TIMESTAMPTZ DEFAULT now(),
parent_name TEXT NOT NULL,
child_name TEXT NOT NULL,
child_quantity INTEGER NOT NULL
);
CREATE TABLE the_schema.sales (
id INTEGER NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
created_at TIMESTAMPTZ DEFAULT now(),
sold_name TEXT NOT NULL,
sold_quantity INTEGER NOT NULL
);
And a view like so, which is mainly to associate the category key with relator.child_name for filtering:
CREATE VIEW the_schema.relationships_with_child_catetgory AS (
SELECT
r.parent_name,
r.child_name,
r.child_quantity,
n.thing_category AS child_category
FROM
the_schema.relator r
INNER JOIN
the_schema.names_categories n
ON r.child_name = n.thing_name
);
And these tables contain some data like this:
INSERT INTO the_schema.names_categories (thing_name, thing_category)
VALUES ('parent1', 'bundle'), ('child1', 'assembly'), ('subChild1', 'component'), ('subChild2', 'component');
INSERT INTO the_schema.relator (parent_name, child_name, child_quantity)
VALUES ('parent1', 'child1', 1),('child1', 'subChild1', 10), ('child1', 'subChild2', 2);
INSERT INTO the_schema.sales (sold_name, sold_quantity)
VALUES ('parent1', 1), ('parent1', 2);
I need to construct a query that, given these data, will return something like the following:
child_name | sum_sold
------------+----------
subChild1 | 30
subChild2 | 6
(2 rows)
The problem is that I haven’t the first idea how to go about this and in fact it’s getting scarier as I type. I’m having a really hard time visualizing the connections that need to be made, so it’s difficult to get started in a logical way.
Usually, Molinaro’s SQL Cookbook has something to get started on, and it does have a section on hierarchical queries, but near as I can tell, none of them serve this particular purpose.
Based on my research on this site, it seems like I probably need to use a recursive CTE /Common Table Expression, as demonstrated in this question/answer, but I’m having considerable difficulty understanding this method and how to use this it for my case.
Aping the example from E. Brandstetter’s answer linked above, I arrive at:
WITH RECURSIVE cte AS (
SELECT
s.sold_name,
r.child_name,
s.sold_quantity AS total
FROM
the_schema.sales s
INNER JOIN
the_schema.relationships_with_child_catetgory r
ON s.sold_name = r.parent_name
UNION ALL
SELECT
c.sold_name,
r.child_name,
(c.total * r.child_quantity)
FROM
cte c
INNER JOIN
the_schema.relationships_with_child_catetgory r
ON r.parent_name = c.child_name
) SELECT * FROM cte
which gets part of the way there:
sold_name | child_name | total
-----------+------------+-------
parent1 | child1 | 1
parent1 | child1 | 2
parent1 | subChild1 | 10
parent1 | subChild1 | 20
parent1 | subChild2 | 2
parent1 | subChild2 | 4
(6 rows)
However, these results include undesired rows (the first two), and when I try to filter the CTE by adding where r.child_category = 'component'
to both parts, the query returns no rows:
sold_name | child_name | total
-----------+------------+-------
(0 rows)
and when I try to group/aggregate, it gives the following error:
ERROR: aggregate functions are not allowed in a recursive query's recursive term
I’m stuck on how to get the undesired rows filtered out and the aggregation happening; clearly I’m failing to comprehend how this recursive CTE works. All guidance is appreciated!
2
Answers
Well, I figured out that the CTE can be used as a subquery, which permits the filtering and aggregation that I needed :
which gives the desired rows:
Which is good and probably enough for the actual case at hand-- but I suspect there's a clearner way to go about this, so I'll be eager to read all other offered answers.
Basically you have the solution. If you stored the quantities and categories in your CTE as well, you can simply add a
WHERE
filter and aSUM
aggregation afterwards:My entire query looks like this (which only differs in the details I mentioned above from yours):
demo:db<>fiddle
Note: I didn’t use your view, because I found it more handy to fetch the data from directly from the tables instead of joining data I already have. But that’s just the way I personally like it 🙂