This is my table structure:
PERSON (4.1 M rows)
id : UUID
notes: string
...
---
ATTRIBUTES (21 M rows)
id: UUID
person_id: <persons id> non null
name: ENUM : full_name | birthday | email | phone | photo
value: JSONB : {value:""} | { first_name:""} | {arbitrary schema based on `name`}
edited_value: JSONB : {value:""} | { first_name:""} | {arbitrary schema based on `name`}
...
---
1 Person -> N Attributes
So now the query response should look something like this:
person_id | notes | attributes |
---|---|---|
uuid | some stuff | [{a1},{a2}…(all attributes)] |
uuid | some stuff | [{a1},{a2}…(all attributes)] |
The issue is I need to sort and filter the list based on certain conditions.
- Sort based on
ATTRIBUTE->edited_value
||ATTRIBUTE->value
for particularATTRIBUTE->name
- like sort by first_name (so sort by rows in ATTRIBUTES where name=first_name)
- Filter by
Attributes -> name
- like GET all person who have a birthday (but the output
attributes
column should have all attributes)
- like GET all person who have a birthday (but the output
My current query looks like this:
- To get all person with
birthday
and sort by theirfirst_name
The query I have now looks like this but it takes 5 seconds to execute on local
SELECT
p.id,
JSONB_AGG(a.*) as attributes,
FROM
person p
LEFT JOIN
"attribute" a ON a.person_id = p.id
AND a.deleted is false
LEFT JOIN
"attribute" ba ON ba.name = 'birthday'
AND ba.person_id = p.id
AND ba.deleted is false
LEFT JOIN
"attribute" fa ON fa.name = 'full_name'
AND fa.person_id = p.id
AND fa.deleted is false
WHERE
p.som_col_id = 'e046dd1d-3444-4195-9c46-e208b2a51703'
AND ba.id IS NOT NULL
AND p.deleted_at IS NULL
GROUP BY
p.id
ORDER BY
LOWER(MAX(COALESCE(fa.edited_value, fa.value)->>'first_name')) ASC NULLS LAST
I have an index on almost all the columns on the where clause except attribute.deleted
.
From the explain plan, the sort seems to be taking a lot of time
Sort (cost=46184.33..46185.19 rows=344 width=80) (actual time=284.351..284.604 rows=115 loops=1) |
Sort Key: (lower(max((COALESCE(fa.edited_value, fa.value) ->> 'first_name'::text)))) |
Sort Method: quicksort Memory: 1750kB |
Buffers: shared hit=7685 read=29921, temp read=314 written=315 |
-> GroupAggregate (cost=46159.52..46169.84 rows=344 width=80) (actual time=258.759..283.345 rows=115 loops=1) |
Group Key: p.id |
Buffers: shared hit=7685 read=29921, temp read=314 written=315 |
-> Sort (cost=46159.52..46160.38 rows=344 width=116) (actual time=258.709..261.751 rows=15967 loops=1) |
Sort Key: p.id |
Sort Method: external merge Disk: 2512kB |
Buffers: shared hit=7685 read=29921, temp read=314 written=315 |
-> Gather (cost=3553.65..46145.03 rows=344 width=116) (actual time=35.024..243.703 rows=15967 loops=1) |
Workers Planned: 2 |
Workers Launched: 2 |
Buffers: shared hit=7685 read=29921 |
-> Nested Loop Left Join (cost=2553.65..45110.63 rows=143 width=116) (actual time=19.666..108.066 rows=5322 loops=3) |
Buffers: shared hit=7685 read=29921 |
-> Nested Loop Left Join (cost=2553.23..41298.71 rows=52 width=80) (actual time=18.861..100.495 rows=38 loops=3) |
Buffers: shared hit=6341 read=29921 |
-> Parallel Hash Join (cost=2552.80..40360.76 rows=52 width=16) (actual time=18.794..99.999 rows=38 loops=3) |
Hash Cond: (ba.contact_id = p.id) |
Buffers: shared hit=5879 read=29921 |
-> Parallel Seq Scan on attribute ba (cost=0.00..37806.51 rows=554 width=16) (actual time=0.624..94.651 rows=1072 loops=3) |
Filter: ((deleted IS FALSE) AND (id IS NOT NULL) AND (name = 'birthday'::attribute_name)) |
Rows Removed by Filter: 177008 |
Buffers: shared hit=5103 read=29921 |
-> Parallel Hash (cost=2486.66..2486.66 rows=5291 width=16) (actual time=3.893..3.894 rows=776 loops=3) |
Buckets: 16384 Batches: 1 Memory Usage: 256kB |
Buffers: shared hit=692 |
-> Parallel Bitmap Heap Scan on contact p (cost=504.00..2486.66 rows=5291 width=16) (actual time=1.132..10.831 rows=2329 loops=1) |
Recheck Cond: (user_id = 'e046dd1d-3444-4195-9c46-e208b2a51703'::uuid) |
Filter: (deleted_at IS NULL) |
Rows Removed by Filter: 9180 |
Heap Blocks: exact=568 |
Buffers: shared hit=692 |
-> Bitmap Index Scan on contact_user_id_index (cost=0.00..501.75 rows=11378 width=0) (actual time=1.014..1.014 rows=11526 loops=1) |
Index Cond: (user_id = 'e046dd1d-3444-4195-9c46-e208b2a51703'::uuid) |
Buffers: shared hit=120 |
-> Index Scan using attribute_contact_id_name_deleted_index on attribute fa (cost=0.42..17.97 rows=7 width=80) (actual time=0.011..0.011 rows=1 loops=115)|
Index Cond: ((contact_id = p.id) AND (name = 'full_name'::attribute_name) AND (deleted = false)) |
Buffers: shared hit=462 |
-> Index Scan using attribute_contact_id_name_deleted_index on attribute a (cost=0.42..59.95 rows=1336 width=52) (actual time=0.004..0.059 rows=139 loops=115) |
Index Cond: ((contact_id = p.id) AND (deleted = false)) |
Buffers: shared hit=1344 |
Planning Time: 0.467 ms |
Execution Time: 285.484 ms |
Can you guys give me any suggestion regarding the improvemnets i can do
Thanks
Edit 1 : Updated Explain Plain with Analyze , buffers
Edit 2 : Updated the query to remove Ambiguity
2
Answers
Quite frequently aggregating first, then joining to the result of the aggregate is faster than the other way round. If you don’t aggregate the attributes into an array of JSON (i.e.
jsonb[]
) but into a regular JSON value you can test for the presence of thebirthday
attribute directly.I find the
canonical_value
,value
andedited_value
a bit confusing.But maybe you can remove the left join for the
full_name
by creating another JSONB object inside the aggregation that contains the correct first_name and then reference that in the order by. So something along the lines:Your data model doesn’t very good, and the way your table and column names keep mutating over the course of the question is pretty confusing.
But it looks like you really need an index on "attribute" which starts with "name", or possibly a partial index on it with
WHERE deleted is false
.That should help improve the execution of the query whose plan you showed us, but since that only took 0.3s, not 5s, it is hard to know what it would do to the unseen 5s query plan.