skip to Main Content

I have 2 tables with flex one-to-many relationship, meaning the parent side may not have any child:

Parent_table

id   dim1   quantity    price
1    "abc"     1         1.00

child_table:

id       parent_id
1          1
2          1

The parent query is very straight forward:

select dim1, count(quantity), sum(price) from parent_table group by dim1

Now, I want to also add another measure call "child_count" to the above query. By doing this, I get a very good performant query but naturally, it double counts the parent columns due to the join:

select p.dim1, count(p.quantity), sum(p.price), count(distinct c.id) as child_count
from parent_table p
  left outer join child_table c on c.parent_id = p.id
group by dim1

I believe the below query is exactly what I want, but these tables are extremely large and I cannot finish the query to verify:

select p.dim1, count(p.quantity), sum(p.price), sum(child_count) as child_count
from parent_table p
  left outer join (select c.parent_id, count(c.id) child_count from child_table c) child on p.id = child.parent_id
group by dim1

Do you have or do you know if Postgres has any smart way to achieve the query I wanted without doing a nested query to count the child records for each parent?

2

Answers


  1. Delete distinct from your query. This yields the same result, but is faster:

    select dim1, count(p.quantity), sum(p.price), count(c.id) as child_count
    from parent_table p
    left join child_table c on c.parent_id = p.id
    group by dim1
    

    Being the PK of child_table, child_table.id is already unique so applying distinct will slow your query down for no benefit.

    Login or Signup to reply.
  2. Going out on a limb, you probably want this query:

    SELECT p.dim1
         , sum(p.quantity) AS sum_quantity
         , sum(p.price) AS sum_price
         , COALESCE(sum(child_count), 0) AS sum_child_count
    FROM   parent_table p
    LEFT   JOIN (
       SELECT parent_id AS id, count(*) AS child_count
       FROM   child_table
       GROUP  BY 1
       ) c USING (id)
    GROUP  BY dim1;
    

    Filling in experience with this kind of questions for missing information.

    The point being: aggregate rows in the child table before joining to the parent table. See:

    Throwing in COALESCE to convert possible null values.

    Also converted to count(quantity) to sum(p.quantity) on suspicion . If quantity is always 1, use count(*) instead. And drop the column quantity.

    An index on child_table.parent_id would help performance substantially – if you get an index-only scan out of it.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search