I have a query to show the average length of all the proteins which also exists in a second table mrna_pieces
:
SELECT AVG(protein_length)
FROM protein
WHERE exists
(SELECT protein_id FROM mrna_pieces WHERE mrna_brokstukken.protein_id = protein.protein_id)
I also want to show the average for all the proteins which don’t exist in the second table.
SELECT AVG(protein_length)
FROM protein
WHERE exists
(SELECT protein_id FROM mrna_pieces WHERE mrna_brokstukken.protein_id != protein.protein_id)
But I want these 2 parts in one table like this
table example
I tried:
SELECT AVG(eiwit_lengte) AS avglengthwith, AVG(eiwit_lengte) AS avglengthwithout
FROM eiwit
WHERE exists
(SELECT eiwit_id FROM mrna_brokstukken WHERE mrna_brokstukken.eiwit_id != eiwit.eiwit_id)
WHERE exists
(SELECT eiwit_id FROM mrna_brokstukken WHERE mrna_brokstukken.eiwit_id = eiwit.eiwit_id)
But that gives me the following error:
ERROR: pq: syntax error at or near "WHERE"
3
Answers
Just select the two at the same time:
as both queries will gove back a scalar value you can simply use bot as subqueries
It should be cheapest to compute the "exists" value once per row in a subquery, then two conditional aggregates in the outer
SELECT
:About the aggregate
FILTER
clause (which can be combined with theOVER
clause of window functions):Or, shorter if two result rows are acceptable:
Should be slightly faster, yet.
Either way, if your table
mrna_pieces
is big, an index withprotein_id
as leading or only column helps performance (a lot).Note that the second query in your question is not doing what you are asking for.
EXISTS ( ... WHERE m.protein_id != p.protein_id)
is true if at least one row in tablemrna_pieces
does not match, which is always the case if there are at least two distinct values inm.protein_id
. You would wantNOT EXISTS ( ... WHERE m.protein_id = p.protein_id)
. But use the cheaper query I suggested instead.