I have 2 tables with flex one-to-many relationship, meaning the parent side may not have any child:
Parent_table
id dim1 quantity price
1 "abc" 1 1.00
child_table:
id parent_id
1 1
2 1
The parent query is very straight forward:
select dim1, count(quantity), sum(price) from parent_table group by dim1
Now, I want to also add another measure call "child_count" to the above query. By doing this, I get a very good performant query but naturally, it double counts the parent columns due to the join:
select p.dim1, count(p.quantity), sum(p.price), count(distinct c.id) as child_count
from parent_table p
left outer join child_table c on c.parent_id = p.id
group by dim1
I believe the below query is exactly what I want, but these tables are extremely large and I cannot finish the query to verify:
select p.dim1, count(p.quantity), sum(p.price), sum(child_count) as child_count
from parent_table p
left outer join (select c.parent_id, count(c.id) child_count from child_table c) child on p.id = child.parent_id
group by dim1
Do you have or do you know if Postgres has any smart way to achieve the query I wanted without doing a nested query to count the child records for each parent?
2
Answers
Delete
distinct
from your query. This yields the same result, but is faster:Being the PK of
child_table
,child_table.id
is already unique so applyingdistinct
will slow your query down for no benefit.Going out on a limb, you probably want this query:
Filling in experience with this kind of questions for missing information.
The point being: aggregate rows in the child table before joining to the parent table. See:
Throwing in
COALESCE
to convert possiblenull
values.Also converted to
count(quantity)
tosum(p.quantity)
on suspicion . If quantity is always1
, usecount(*)
instead. And drop the columnquantity
.An index on
child_table.parent_id
would help performance substantially – if you get an index-only scan out of it.