This is my message
table. Now I am trying to write an query to group records by collection_id
(only first instance) if type
multi-store
else by message
.id
.
id | collection_id | type | affiliation_id | status | scheduled_for_date
--------------------------------------+--------------------------------------+----------------+----------------+--------------+--------------------
1143c066-01ed-4eb5-a146-68487de702a9 | bce85e31-4d2f-43b1-b263-57fca356856f | multi-store | 12091 | draft |
e1183732-e91d-42eb-9998-110e039cfc25 | bce85e31-4d2f-43b1-b263-57fca356856f | multi-store | 12092 | draft |
8f962a49-7da1-46d0-87b9-595788767dfe | bce85e31-4d2f-43b1-b263-57fca356856f | multi-store | 12097 | draft |
6e4dee09-7a4e-47be-bdd8-935a67bb2063 | 740f6b42-bbf1-4aeb-8fe9-6874635d9e29 | multi-store | 12091 | draft |
79afab0e-14e7-4d1b-9a15-358763743c3e | 740f6b42-bbf1-4aeb-8fe9-6874635d9e29 | multi-store | 12092 | draft |
7bc78bee-074a-4031-9492-954e7c4eeb09 | 740f6b42-bbf1-4aeb-8fe9-6874635d9e29 | multi-store | 12097 | draft |
3bb38fbd-d411-4f78-9c42-c858bf57b784 | | standard-store | 10511 | draft |
fbb3b175-1a3b-4515-b0b3-0ce6d6d0145f | | standard-store | 10511 | draft |
84004999-d2cf-4af4-bfaa-c1077d1d8621 | | standard-store | 10511 | sent | 2017-05-21
cbea0789-6886-431a-a8e1-723d5aafc7b9 | | standard-store | 10511 | scheduled | 2019-02-12
ec8988ff-5136-4b81-b448-cd456dc487a4 | | standard-store | 10511 | review | 2019-01-13
0e119440-5fbc-4afe-a784-a6bcfe3a6e4d | | standard-store | 10511 | draft |
98503a20-4396-4809-b3ec-8e330c15afa9 | | standard-store | 10511 | needs_action | 2018-12-11
d33a9173-dc64-464f-8e58-49b4c9c2fdae | | standard-store | 10511 | draft |
bee0dc72-acca-44e2-82ea-d18e830f91a2 | | standard-store | 10511 | sent | 2016-03-12
So the output will be like
id | collection_id | type | affiliation_id | status | scheduled_for_date
--------------------------------------+--------------------------------------+----------------+----------------+--------------+--------------------
1143c066-01ed-4eb5-a146-68487de702a9 | bce85e31-4d2f-43b1-b263-57fca356856f | multi-store | 12091 | draft |
6e4dee09-7a4e-47be-bdd8-935a67bb2063 | 740f6b42-bbf1-4aeb-8fe9-6874635d9e29 | multi-store | 12091 | draft |
3bb38fbd-d411-4f78-9c42-c858bf57b784 | | standard-store | 10511 | draft |
fbb3b175-1a3b-4515-b0b3-0ce6d6d0145f | | standard-store | 10511 | draft |
84004999-d2cf-4af4-bfaa-c1077d1d8621 | | standard-store | 10511 | sent | 2017-05-21
cbea0789-6886-431a-a8e1-723d5aafc7b9 | | standard-store | 10511 | scheduled | 2019-02-12
ec8988ff-5136-4b81-b448-cd456dc487a4 | | standard-store | 10511 | review | 2019-01-13
0e119440-5fbc-4afe-a784-a6bcfe3a6e4d | | standard-store | 10511 | draft |
98503a20-4396-4809-b3ec-8e330c15afa9 | | standard-store | 10511 | needs_action | 2018-12-11
d33a9173-dc64-464f-8e58-49b4c9c2fdae | | standard-store | 10511 | draft |
bee0dc72-acca-44e2-82ea-d18e830f91a2 | | standard-store | 10511 | sent | 2016-03-12
One way to do it is by union
:
SELECT DISTINCT ON (collection_id)
collection_id,
id,
type,
affiliation_id,
status,
scheduled_for_date
from
message
where
type = 'multi-store'
union
SELECT
collection_id,
id,
type,
affiliation_id,
status,
scheduled_for_date
from
message
where
type = 'standard-store'
But I feel that’s less efficient, another option could be case based grouping. But that includes complexity when need to add all the select fields to group.
What is the most efficient way to write the query?
2
Answers
You can use
distinct on
clause with acase
expression.DB-Fiddle demo
Or, assuming that
collection_id
is NULL for types other than ‘multi-store’, even simplerThe best solution depends on undisclosed cardinalities.
Assuming relatively few "multi-store" rows, and relatively few rows per distinct
collection_id
among those, yourUNION
query looks good – after a couple of tweaks:fiddle
Use
UNION ALL
. JustUNION
tries to eliminate (non-existent) duplicates, adding pointless cost.Typically, you want a deterministic, stable pick from each set with the same
collection_id
. Add a suitableORDER BY
clause accordingly. See:In this case, a partial index would be of help:
The index is used for the small share of "multi-store" rows, the rest is read from a sequential scan.