PostgreSQL joining same table twice one for getting data and other for filtering

mohamedArshad
November 24, 2022
177 views
0 votes
2 Answers

This is my table structure:

PERSON (4.1 M rows)
id : UUID
notes: string
...
---

ATTRIBUTES (21 M rows)
id:             UUID
person_id:      <persons id> non null
name:           ENUM : full_name | birthday | email | phone | photo
value:          JSONB : {value:""} | { first_name:""} | {arbitrary schema based on `name`}
edited_value:   JSONB : {value:""} | { first_name:""} | {arbitrary schema based on `name`}
...

---

1 Person  -> N Attributes

So now the query response should look something like this:

person_id	notes	attributes
uuid	some stuff	[{a1},{a2}…(all attributes)]
uuid	some stuff	[{a1},{a2}…(all attributes)]

The issue is I need to sort and filter the list based on certain conditions.

Sort based on ATTRIBUTE->edited_value||ATTRIBUTE->value for particular ATTRIBUTE->name
- like sort by first_name (so sort by rows in ATTRIBUTES where name=first_name)
Filter by Attributes -> name
- like GET all person who have a birthday (but the output attributes column should have all attributes)

My current query looks like this:

To get all person with birthday and sort by their first_name

The query I have now looks like this but it takes 5 seconds to execute on local

SELECT
    p.id,
    JSONB_AGG(a.*) as attributes,
FROM
    person p
LEFT JOIN
    "attribute" a ON a.person_id = p.id
                  AND a.deleted is false
LEFT JOIN 
    "attribute" ba ON ba.name = 'birthday'
                   AND ba.person_id = p.id
                   AND ba.deleted is false
LEFT JOIN 
    "attribute" fa ON fa.name = 'full_name'
                   AND fa.person_id = p.id
                   AND fa.deleted is false
WHERE
    p.som_col_id = 'e046dd1d-3444-4195-9c46-e208b2a51703'
    AND ba.id IS NOT NULL
    AND p.deleted_at IS NULL
GROUP BY
    p.id
ORDER BY
    LOWER(MAX(COALESCE(fa.edited_value, fa.value)->>'first_name')) ASC NULLS LAST

I have an index on almost all the columns on the where clause except attribute.deleted.

From the explain plan, the sort seems to be taking a lot of time

Sort  (cost=46184.33..46185.19 rows=344 width=80) (actual time=284.351..284.604 rows=115 loops=1)                                                                                           |
  Sort Key: (lower(max((COALESCE(fa.edited_value, fa.value) ->> 'first_name'::text))))                                                                                                      |
  Sort Method: quicksort  Memory: 1750kB                                                                                                                                                    |
  Buffers: shared hit=7685 read=29921, temp read=314 written=315                                                                                                                            |
  ->  GroupAggregate  (cost=46159.52..46169.84 rows=344 width=80) (actual time=258.759..283.345 rows=115 loops=1)                                                                           |
        Group Key: p.id                                                                                                                                                                     |
        Buffers: shared hit=7685 read=29921, temp read=314 written=315                                                                                                                      |
        ->  Sort  (cost=46159.52..46160.38 rows=344 width=116) (actual time=258.709..261.751 rows=15967 loops=1)                                                                            |
              Sort Key: p.id                                                                                                                                                                |
              Sort Method: external merge  Disk: 2512kB                                                                                                                                     |
              Buffers: shared hit=7685 read=29921, temp read=314 written=315                                                                                                                |
              ->  Gather  (cost=3553.65..46145.03 rows=344 width=116) (actual time=35.024..243.703 rows=15967 loops=1)                                                                      |
                    Workers Planned: 2                                                                                                                                                      |
                    Workers Launched: 2                                                                                                                                                     |
                    Buffers: shared hit=7685 read=29921                                                                                                                                     |
                    ->  Nested Loop Left Join  (cost=2553.65..45110.63 rows=143 width=116) (actual time=19.666..108.066 rows=5322 loops=3)                                                  |
                          Buffers: shared hit=7685 read=29921                                                                                                                               |
                          ->  Nested Loop Left Join  (cost=2553.23..41298.71 rows=52 width=80) (actual time=18.861..100.495 rows=38 loops=3)                                                |
                                Buffers: shared hit=6341 read=29921                                                                                                                         |
                                ->  Parallel Hash Join  (cost=2552.80..40360.76 rows=52 width=16) (actual time=18.794..99.999 rows=38 loops=3)                                              |
                                      Hash Cond: (ba.contact_id = p.id)                                                                                                                     |
                                      Buffers: shared hit=5879 read=29921                                                                                                                   |
                                      ->  Parallel Seq Scan on attribute ba  (cost=0.00..37806.51 rows=554 width=16) (actual time=0.624..94.651 rows=1072 loops=3)                          |
                                            Filter: ((deleted IS FALSE) AND (id IS NOT NULL) AND (name = 'birthday'::attribute_name))                                                       |
                                            Rows Removed by Filter: 177008                                                                                                                  |
                                            Buffers: shared hit=5103 read=29921                                                                                                             |
                                      ->  Parallel Hash  (cost=2486.66..2486.66 rows=5291 width=16) (actual time=3.893..3.894 rows=776 loops=3)                                             |
                                            Buckets: 16384  Batches: 1  Memory Usage: 256kB                                                                                                 |
                                            Buffers: shared hit=692                                                                                                                         |
                                            ->  Parallel Bitmap Heap Scan on contact p  (cost=504.00..2486.66 rows=5291 width=16) (actual time=1.132..10.831 rows=2329 loops=1)             |
                                                  Recheck Cond: (user_id = 'e046dd1d-3444-4195-9c46-e208b2a51703'::uuid)                                                                    |
                                                  Filter: (deleted_at IS NULL)                                                                                                              |
                                                  Rows Removed by Filter: 9180                                                                                                              |
                                                  Heap Blocks: exact=568                                                                                                                    |
                                                  Buffers: shared hit=692                                                                                                                   |
                                                  ->  Bitmap Index Scan on contact_user_id_index  (cost=0.00..501.75 rows=11378 width=0) (actual time=1.014..1.014 rows=11526 loops=1)      |
                                                        Index Cond: (user_id = 'e046dd1d-3444-4195-9c46-e208b2a51703'::uuid)                                                                |
                                                        Buffers: shared hit=120                                                                                                             |
                                ->  Index Scan using attribute_contact_id_name_deleted_index on attribute fa  (cost=0.42..17.97 rows=7 width=80) (actual time=0.011..0.011 rows=1 loops=115)|
                                      Index Cond: ((contact_id = p.id) AND (name = 'full_name'::attribute_name) AND (deleted = false))                                                      |
                                      Buffers: shared hit=462                                                                                                                               |
                          ->  Index Scan using attribute_contact_id_name_deleted_index on attribute a  (cost=0.42..59.95 rows=1336 width=52) (actual time=0.004..0.059 rows=139 loops=115)  |
                                Index Cond: ((contact_id = p.id) AND (deleted = false))                                                                                                     |
                                Buffers: shared hit=1344                                                                                                                                    |
Planning Time: 0.467 ms                                                                                                                                                                     |
Execution Time: 285.484 ms                                                                                                                                                                  |

Can you guys give me any suggestion regarding the improvemnets i can do

Thanks

Edit 1 : Updated Explain Plain with Analyze , buffers
Edit 2 : Updated the query to remove Ambiguity

Answers

Quite frequently aggregating first, then joining to the result of the aggregate is faster than the other way round. If you don’t aggregate the attributes into an array of JSON (i.e. jsonb[]) but into a regular JSON value you can test for the presence of the birthday attribute directly.

SELECT p.id,
       a.attributes
FROM person p
  LEFT JOIN ( 
    select person_id, 
           jsonb_object_agg(att.name, att.canonical_value)) as attributes,
           jsonb_object_agg(att.name, MAX(COALESCE(fa.edited_value, fa.value))) as attributes,
    from attribute att
    where att.deleted = false
    group by att.person_id 
  ) a ON a.person_id = p.id
  LEFT JOIN 
      "attribute" fa ON fa.name = 'full_name'
                     AND fa.person_id = p.id
                     AND fa.deleted is false
WHERE p.som_col_id = 'e046dd1d-3444-4195-9c46-e208b2a51703'
  AND a.attributes ? 'birthday'
  AND p.deleted_at IS NULL
ORDER BY LOWER(MAX(COALESCE(fa.edited_value, fa.value)->>'first_name')) ASC NULLS LAST

I find the canonical_value, value and edited_value a bit confusing.

But maybe you can remove the left join for the full_name by creating another JSONB object inside the aggregation that contains the correct first_name and then reference that in the order by. So something along the lines:

SELECT p.id,
       a.attributes
FROM person p
  LEFT JOIN ( 
    select att.person_id, 
           jsonb_object_agg(att.name, att.canonical_value) as attributes,
           jsonb_object_agg(att.name, COALESCE(att.edited_value, fa.value)) as real_values
    from attribute att
    where att.deleted = false
    group by att.person_id 
  ) a ON a.person_id = p.id
WHERE p.som_col_id = 'e046dd1d-3444-4195-9c46-e208b2a51703'
  AND attributes ? 'birthday'
  AND p.deleted_at IS NULL
ORDER BY LOWER(real_values ->> 'first_name') ASC NULLS LAST

- jjanes
- November 24, 2022 at 4:39 pm
- 0 votes
0
Your data model doesn’t very good, and the way your table and column names keep mutating over the course of the question is pretty confusing.

But it looks like you really need an index on "attribute" which starts with "name", or possibly a partial index on it with WHERE deleted is false.

That should help improve the execution of the query whose plan you showed us, but since that only took 0.3s, not 5s, it is hard to know what it would do to the unseen 5s query plan.

Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.