skip to Main Content

My goal

To calculate the sum of components sold as part of their parent assemblies.

I’m sure this must be a common use case, but I haven’t yet found documentation that leads to the result I’m looking for.

Background

I’m running Postgres 11 on CentOS 7.
I have some tables like as follows:

CREATE TABLE the_schema.names_categories (
    id INTEGER NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    created_at TIMESTAMPTZ DEFAULT now(),
    thing_name TEXT NOT NULL, 
    thing_category TEXT NOT NULL
);

CREATE TABLE the_schema.relator (
    id INTEGER NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    created_at TIMESTAMPTZ DEFAULT now(),
    parent_name TEXT NOT NULL, 
    child_name TEXT NOT NULL,
    child_quantity INTEGER NOT NULL 
);

CREATE TABLE the_schema.sales (
    id INTEGER NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    created_at TIMESTAMPTZ DEFAULT now(),    
    sold_name TEXT NOT NULL,
    sold_quantity INTEGER NOT NULL
);

And a view like so, which is mainly to associate the category key with relator.child_name for filtering:

CREATE VIEW the_schema.relationships_with_child_catetgory AS (
    SELECT 
    r.parent_name,
    r.child_name, 
    r.child_quantity,
    n.thing_category AS child_category
    FROM 
    the_schema.relator r 
    INNER JOIN 
    the_schema.names_categories n 
    ON r.child_name = n.thing_name 
);

And these tables contain some data like this:

INSERT INTO the_schema.names_categories (thing_name, thing_category)
VALUES ('parent1', 'bundle'), ('child1', 'assembly'), ('subChild1', 'component'), ('subChild2', 'component');

INSERT INTO the_schema.relator (parent_name, child_name, child_quantity)
VALUES ('parent1', 'child1', 1),('child1', 'subChild1', 10), ('child1', 'subChild2', 2);

INSERT INTO the_schema.sales (sold_name, sold_quantity)
VALUES ('parent1', 1), ('parent1', 2);

I need to construct a query that, given these data, will return something like the following:

 child_name | sum_sold 
------------+----------
 subChild1  |       30
 subChild2  |        6
(2 rows)

The problem is that I haven’t the first idea how to go about this and in fact it’s getting scarier as I type. I’m having a really hard time visualizing the connections that need to be made, so it’s difficult to get started in a logical way.
Usually, Molinaro’s SQL Cookbook has something to get started on, and it does have a section on hierarchical queries, but near as I can tell, none of them serve this particular purpose.

Based on my research on this site, it seems like I probably need to use a recursive CTE /Common Table Expression, as demonstrated in this question/answer, but I’m having considerable difficulty understanding this method and how to use this it for my case.

Aping the example from E. Brandstetter’s answer linked above, I arrive at:

WITH RECURSIVE cte AS (
    SELECT 
    s.sold_name,
    r.child_name, 
    s.sold_quantity AS total 
    FROM 
    the_schema.sales s
    INNER JOIN 
    the_schema.relationships_with_child_catetgory r 
    ON s.sold_name = r.parent_name

    UNION  ALL

    SELECT 
    c.sold_name, 
    r.child_name, 
    (c.total  *  r.child_quantity)
    FROM 
    cte c 
    INNER JOIN 
    the_schema.relationships_with_child_catetgory r
    ON r.parent_name = c.child_name
) SELECT * FROM cte 

which gets part of the way there:

 sold_name | child_name | total 
-----------+------------+-------
 parent1   | child1     |     1
 parent1   | child1     |     2
 parent1   | subChild1  |    10
 parent1   | subChild1  |    20
 parent1   | subChild2  |     2
 parent1   | subChild2  |     4
(6 rows)

However, these results include undesired rows (the first two), and when I try to filter the CTE by adding where r.child_category = 'component' to both parts, the query returns no rows:

 sold_name | child_name | total 
-----------+------------+-------
(0 rows)

and when I try to group/aggregate, it gives the following error:
ERROR: aggregate functions are not allowed in a recursive query's recursive term

I’m stuck on how to get the undesired rows filtered out and the aggregation happening; clearly I’m failing to comprehend how this recursive CTE works. All guidance is appreciated!

2

Answers


  1. Chosen as BEST ANSWER

    Well, I figured out that the CTE can be used as a subquery, which permits the filtering and aggregation that I needed :

    SELECT
    cte.child_name, 
    sum(cte.total)
    FROM
    (
    WITH RECURSIVE cte AS (
        SELECT 
        s.sold_name,
        r.child_name, 
        s.sold_quantity AS total 
        FROM 
        the_schema.sales s
        INNER JOIN 
        the_schema.relationships_with_child_catetgory r 
        ON s.sold_name = r.parent_name
    
        UNION  ALL
    
        SELECT 
        c.sold_name, 
        r.child_name, 
        (c.total  *  r.child_quantity)
        FROM 
        cte c 
        INNER JOIN 
        the_schema.relationships_with_child_catetgory r
        ON r.parent_name = c.child_name
    
    ) SELECT * FROM cte  ) AS cte
    INNER JOIN 
    the_schema.relationships_with_child_catetgory r1 
    ON cte.child_name = r1.child_name 
    WHERE r1.child_category = 'component'
    GROUP BY cte.child_name
    ;
    

    which gives the desired rows:

     child_name | sum 
    ------------+-----
     subChild2  |   6
     subChild1  |  30
    (2 rows)
    

    Which is good and probably enough for the actual case at hand-- but I suspect there's a clearner way to go about this, so I'll be eager to read all other offered answers.


  2. Basically you have the solution. If you stored the quantities and categories in your CTE as well, you can simply add a WHERE filter and a SUM aggregation afterwards:

    SELECT
        child_name,
        SUM(sold_quantity * child_quantity)
    FROM cte
    WHERE category = 'component'
    GROUP BY child_name
    

    My entire query looks like this (which only differs in the details I mentioned above from yours):

    demo:db<>fiddle

    WITH RECURSIVE cte AS (
        SELECT 
            s.sold_name,
            s.sold_quantity,
            r.child_name,
            r.child_quantity,
            nc.thing_category as category
        FROM 
            sales s
        JOIN relator r
        ON s.sold_name = r.parent_name
        JOIN names_categories nc
        ON r.child_name = nc.thing_name
        
        UNION ALL
        
        SELECT
            cte.sold_name,
            cte.sold_quantity,
            r.child_name,
            r.child_quantity,
            nc.thing_category
        FROM cte
        JOIN relator r ON cte.child_name = r.parent_name
        JOIN names_categories nc
        ON r.child_name = nc.thing_name
    )
    SELECT
        child_name,
        SUM(sold_quantity * child_quantity)
    FROM cte
    WHERE category = 'component'
    GROUP BY child_name
    

    Note: I didn’t use your view, because I found it more handy to fetch the data from directly from the tables instead of joining data I already have. But that’s just the way I personally like it 🙂

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search