How can I improve the performance of the query bellow?
Currently for a set of 110k entries in the products table and ~ 1.8 million ids returned by the union all selection, the whole query will take about 8 seconds to return.
If I can get it below 1 second it would be super.
SELECT *
FROM products p
WHERE p.id IN (
(SELECT id FROM products WHERE universal_product_code ILIKE '%t%' OR model ILIKE '%t%' OR isbn ILIKE '%t%')
UNION ALL
(SELECT product_id FROM products_descriptions WHERE (title ILIKE '%t%' OR description ILIKE '%t%' OR ts_title @@ websearch_to_tsquery('english'::regconfig, 't') OR ts_description @@ websearch_to_tsquery('english'::regconfig, 't')) AND locale = 'en' )
UNION ALL
(SELECT product_id FROM products_contents WHERE content_id IN (SELECT id FROM contents WHERE (value ILIKE '%t%' OR ts_value @@ websearch_to_tsquery('english'::regconfig, 't')) AND locale = 'en' ))
UNION ALL
(SELECT product_id FROM products_tags WHERE tag_id IN (SELECT id FROM tags WHERE (value ILIKE '%t%' OR ts_value @@ websearch_to_tsquery('english'::regconfig, 't')) AND locale = 'en' ))
UNION ALL
(SELECT product_id FROM products_labels WHERE label_id IN (SELECT id FROM labels WHERE (value ILIKE '%t%' OR ts_value @@ websearch_to_tsquery('english'::regconfig, 't')) AND locale = 'en' ))
UNION ALL
(SELECT product_id FROM products_brands WHERE brand_id IN (SELECT id FROM brands WHERE (value ILIKE '%t%' OR ts_value @@ websearch_to_tsquery('english'::regconfig, 't')) AND locale = 'en' ))
UNION ALL
(SELECT product_id FROM products_nutrition_facts WHERE id IN (SELECT product_nutrition_fact_id FROM products_nutrition_facts_names WHERE nutrient_name_id IN (SELECT id FROM nutrients_names WHERE (value ILIKE '%t%' OR ts_value @@ websearch_to_tsquery('english'::regconfig, 't')) AND locale = 'en' )))
)
LIMIT 10 OFFSET 0;
Run separately, the union select takes about 0.2 seconds to complete.
My postgres server version is : PostgreSQL 15.2 on x86_64-pc-linux-musl, compiled by gcc (Alpine 12.2.1_git20220924-r4) 12.2.1 20220924, 64-bit
Thanks
PS the explain(analyze, verbose, buffers, settings) output looks like this:
Limit (cost=98453.56..98453.80 rows=10 width=578) (actual time=8189.275..8189.372 rows=10 loops=1)
Output: p.id, p.universal_product_code, p.mass, p.display_mass_unit, p.length, p.width, p.height, p.display_lenght_unit, p.volume, p.display_volume_unit, p.category, p.creation_time, p.update_time, p.username, p.nutrition_facts_per_quantity, p.nutrition_facts_unit, p.rating, p.model, p.isbn, p.normalized_length, p.normalized_width, p.normalized_height, p.normalized_volume, p.normalized_mass, p.link1, p.link2, p.link3, p.discontinued
Buffers: shared hit=2795010 read=359171, temp read=4712 written=7124
-> Nested Loop (cost=98453.56..99752.23 rows=55416 width=578) (actual time=8189.274..8189.369 rows=10 loops=1)
Output: p.id, p.universal_product_code, p.mass, p.display_mass_unit, p.length, p.width, p.height, p.display_lenght_unit, p.volume, p.display_volume_unit, p.category, p.creation_time, p.update_time, p.username, p.nutrition_facts_per_quantity, p.nutrition_facts_unit, p.rating, p.model, p.isbn, p.normalized_length, p.normalized_width, p.normalized_height, p.normalized_volume, p.normalized_mass, p.link1, p.link2, p.link3, p.discontinued
Inner Unique: true
Buffers: shared hit=2795010 read=359171, temp read=4712 written=7124
-> HashAggregate (cost=98453.14..98455.14 rows=200 width=16) (actual time=8189.217..8189.230 rows=10 loops=1)
Output: products.id
Group Key: products.id
Batches: 5 Memory Usage: 11073kB Disk Usage: 744kB
Buffers: shared hit=2794994 read=359147, temp read=4712 written=7124
-> Append (cost=0.00..95351.89 rows=1240501 width=16) (actual time=132.138..7045.239 rows=1838901 loops=1)
Buffers: shared hit=2794994 read=359147, temp read=4712 written=6970
-> Seq Scan on public.products (cost=0.00..4559.54 rows=11 width=16) (actual time=132.119..132.120 rows=0 loops=1)
Output: products.id
Filter: (((products.universal_product_code)::text ~~* '%t%'::text) OR ((products.model)::text ~~* '%t%'::text) OR ((products.isbn)::text ~~* '%t%'::text))
Rows Removed by Filter: 110686
Buffers: shared hit=10 read=2610
-> Seq Scan on public.products_descriptions (cost=0.00..5103.81 rows=80292 width=16) (actual time=0.016..210.044 rows=73857 loops=1)
Output: products_descriptions.product_id
Filter: (((products_descriptions.locale)::text = 'en'::text) AND (((products_descriptions.title)::text ~~* '%t%'::text) OR (products_descriptions.description ~~* '%t%'::text) OR (products_descriptions.ts_title @@ ''::tsquery) OR (products_descriptions.ts_description @@ ''::tsquery)))
Rows Removed by Filter: 36821
Buffers: shared hit=1 read=2613
-> Hash Join (cost=2289.08..40316.24 rows=968689 width=16) (actual time=72.685..827.627 rows=774091 loops=1)
Output: products_contents.product_id
Inner Unique: true
Hash Cond: (products_contents.content_id = contents.id)
Buffers: shared hit=2 read=17497
-> Seq Scan on public.products_contents (cost=0.00..33484.08 rows=1730508 width=32) (actual time=0.014..228.075 rows=1729920 loops=1)
Output: products_contents.product_id, products_contents.content_id, products_contents.username, products_contents.creation_time
Buffers: shared hit=1 read=16178
-> Hash (cost=2012.28..2012.28 rows=22144 width=16) (actual time=72.456..72.457 rows=25991 loops=1)
Output: contents.id
Buckets: 32768 Batches: 1 Memory Usage: 1475kB
Buffers: shared hit=1 read=1319
-> Seq Scan on public.contents (cost=0.00..2012.28 rows=22144 width=16) (actual time=0.025..63.276 rows=25991 loops=1)
Output: contents.id
Filter: (((contents.locale)::text = 'en'::text) AND (((contents.value)::text ~~* '%t%'::text) OR (contents.ts_value @@ ''::tsquery)))
Rows Removed by Filter: 15966
Buffers: shared hit=1 read=1319
-> Hash Join (cost=434.27..7437.79 rows=141699 width=16) (actual time=5.281..150.028 rows=202554 loops=1)
Output: products_tags.product_id
Inner Unique: true
Hash Cond: (products_tags.tag_id = tags.id)
Buffers: shared hit=1 read=3348
-> Seq Scan on public.products_tags (cost=0.00..6166.40 rows=318440 width=32) (actual time=0.028..43.003 rows=318449 loops=1)
Output: products_tags.product_id, products_tags.tag_id, products_tags.username, products_tags.creation_time
Buffers: shared read=2982
-> Hash (cost=418.05..418.05 rows=1298 width=16) (actual time=5.227..5.228 rows=1525 loops=1)
Output: tags.id
Buckets: 2048 Batches: 1 Memory Usage: 88kB
Buffers: shared hit=1 read=366
-> Seq Scan on public.tags (cost=0.00..418.05 rows=1298 width=16) (actual time=0.022..4.751 rows=1525 loops=1)
Output: tags.id
Filter: (((tags.locale)::text = 'en'::text) AND (((tags.value)::text ~~* '%t%'::text) OR (tags.ts_value @@ ''::tsquery)))
Rows Removed by Filter: 1522
Buffers: shared hit=1 read=366
-> Hash Join (cost=50.31..583.75 rows=10897 width=16) (actual time=0.968..10.939 rows=11935 loops=1)
Output: products_labels.product_id
Inner Unique: true
Hash Cond: (products_labels.label_id = labels.id)
Buffers: shared hit=2 read=264
-> Seq Scan on public.products_labels (cost=0.00..469.55 rows=24155 width=32) (actual time=0.007..3.247 rows=23878 loops=1)
Output: products_labels.product_id, products_labels.label_id, products_labels.username, products_labels.creation_time
Buffers: shared hit=1 read=227
-> Hash (cost=47.31..47.31 rows=240 width=16) (actual time=0.922..0.923 rows=286 loops=1)
Output: labels.id
Buckets: 1024 Batches: 1 Memory Usage: 22kB
Buffers: shared hit=1 read=37
-> Seq Scan on public.labels (cost=0.00..47.31 rows=240 width=16) (actual time=0.021..0.833 rows=286 loops=1)
Output: labels.id
Filter: (((labels.locale)::text = 'en'::text) AND (((labels.value)::text ~~* '%t%'::text) OR (labels.ts_value @@ ''::tsquery)))
Rows Removed by Filter: 317
Buffers: shared hit=1 read=37
-> Hash Join (cost=205.12..1625.43 rows=29265 width=16) (actual time=8.221..35.553 rows=29860 loops=1)
Output: products_brands.product_id
Inner Unique: true
Hash Cond: (products_brands.brand_id = brands.id)
Buffers: shared hit=1 read=692
-> Seq Scan on public.products_brands (cost=0.00..1251.09 rows=64409 width=32) (actual time=0.008..8.753 rows=64346 loops=1)
Output: products_brands.product_id, products_brands.brand_id, products_brands.username, products_brands.creation_time
Buffers: shared read=607
-> Hash (cost=175.93..175.93 rows=2335 width=16) (actual time=8.170..8.170 rows=2552 loops=1)
Output: brands.id
Buckets: 4096 Batches: 1 Memory Usage: 152kB
Buffers: shared hit=1 read=85
-> Seq Scan on public.brands (cost=0.00..175.93 rows=2335 width=16) (actual time=0.017..7.382 rows=2552 loops=1)
Output: brands.id
Filter: (((brands.locale)::text = 'en'::text) AND (((brands.value)::text ~~* '%t%'::text) OR (brands.ts_value @@ ''::tsquery)))
Rows Removed by Filter: 2827
Buffers: shared hit=1 read=85
-> Nested Loop (cost=11981.95..17117.83 rows=9648 width=16) (actual time=882.067..5453.902 rows=746604 loops=1)
Output: products_nutrition_facts.product_id
Inner Unique: true
Buffers: shared hit=2794977 read=332123, temp read=4712 written=6970
-> HashAggregate (cost=11981.52..12078.00 rows=9648 width=16) (actual time=882.035..1553.084 rows=746604 loops=1)
Output: products_nutrition_facts_names.product_nutrition_fact_id
Group Key: products_nutrition_facts_names.product_nutrition_fact_id
Batches: 21 Memory Usage: 11065kB Disk Usage: 23888kB
Buffers: shared hit=128716 read=11836, temp read=4712 written=6970
-> Nested Loop (cost=255.36..11957.40 rows=9648 width=16) (actual time=7.121..498.089 rows=746604 loops=1)
Output: products_nutrition_facts_names.product_nutrition_fact_id
Buffers: shared hit=128716 read=11836
-> Seq Scan on public.nutrients_names (cost=0.00..4.17 rows=1 width=16) (actual time=0.016..0.271 rows=39 loops=1)
Output: nutrients_names.id, nutrients_names.value, nutrients_names.locale, nutrients_names.username, nutrients_names.creation_time, nutrients_names.update_time, nutrients_names.language, nutrients_names.ts_value
Filter: (((nutrients_names.locale)::text = 'en'::text) AND (((nutrients_names.value)::text ~~* '%t%'::text) OR (nutrients_names.ts_value @@ ''::tsquery)))
Rows Removed by Filter: 41
Buffers: shared read=2
-> Bitmap Heap Scan on public.products_nutrition_facts_names (cost=255.36..11727.51 rows=22572 width=32) (actual time=1.869..9.512 rows=19144 loops=39)
Output: products_nutrition_facts_names.product_nutrition_fact_id, products_nutrition_facts_names.nutrient_name_id, products_nutrition_facts_names.username, products_nutrition_facts_names.creation_time
Recheck Cond: (products_nutrition_facts_names.nutrient_name_id = nutrients_names.id)
Heap Blocks: exact=139830
Buffers: shared hit=128716 read=11834
-> Bitmap Index Scan on products_nutrition_facts_names_nutrient_name_id_idx (cost=0.00..249.72 rows=22572 width=0) (actual time=1.047..1.047 rows=19144 loops=39)
Index Cond: (products_nutrition_facts_names.nutrient_name_id = nutrients_names.id)
Buffers: shared hit=75 read=645
-> Index Scan using products_nutrition_facts_id_idx on public.products_nutrition_facts (cost=0.43..0.52 rows=1 width=32) (actual time=0.005..0.005 rows=1 loops=746604)
Output: products_nutrition_facts.product_id, products_nutrition_facts.id
Index Cond: (products_nutrition_facts.id = products_nutrition_facts_names.product_nutrition_fact_id)
Buffers: shared hit=2666237 read=320287
-> Index Scan using products_id_idx on public.products p (cost=0.42..7.70 rows=1 width=578) (actual time=0.013..0.013 rows=1 loops=10)
Output: p.id, p.universal_product_code, p.mass, p.display_mass_unit, p.length, p.width, p.height, p.display_lenght_unit, p.volume, p.display_volume_unit, p.category, p.creation_time, p.update_time, p.username, p.nutrition_facts_per_quantity, p.nutrition_facts_unit, p.rating, p.model, p.isbn, p.normalized_length, p.normalized_width, p.normalized_height, p.normalized_volume, p.normalized_mass, p.link1, p.link2, p.link3, p.discontinued
Index Cond: (p.id = products.id)
Buffers: shared hit=16 read=24
Settings: search_path = 'public, public, "$user"'
Planning:
Buffers: shared hit=76
Planning Time: 3.858 ms
Execution Time: 8194.961 ms
products table:
CREATE TABLE public.products (
id uuid NOT NULL,
universal_product_code varchar(50) NULL,
mass numeric NULL,
display_mass_unit varchar(10) NULL,
length numeric NULL,
width numeric NULL,
height numeric NULL,
display_lenght_unit varchar(10) NULL,
volume numeric NULL,
display_volume_unit varchar(10) NULL,
category int2 NOT NULL,
creation_time timestamp NOT NULL,
update_time timestamp NOT NULL,
username varchar(255) NOT NULL,
nutrition_facts_per_quantity numeric NULL,
nutrition_facts_unit varchar(10) NULL DEFAULT NULL::character varying,
rating numeric NULL,
model varchar(100) NULL,
isbn varchar(50) NULL,
normalized_length numeric NULL,
normalized_width numeric NULL,
normalized_height numeric NULL,
normalized_volume numeric NULL,
normalized_mass numeric NULL,
link1 varchar(2048) NULL DEFAULT NULL::character varying,
link2 varchar(2048) NULL DEFAULT NULL::character varying,
link3 varchar(2048) NULL DEFAULT NULL::character varying,
discontinued bool NULL DEFAULT false,
CONSTRAINT product_pkey PRIMARY KEY (id)
);
CREATE INDEX products_category_idx ON public.products USING btree (category);
CREATE INDEX products_id_idx ON public.products USING btree (id);
CREATE INDEX products_isbn_idx ON public.products USING btree (isbn);
CREATE INDEX products_model_idx ON public.products USING btree (model);
CREATE INDEX products_universal_product_code_idx ON public.products USING btree (universal_product_code);
Products_descriptions table:
CREATE TABLE public.products_descriptions (
id uuid NOT NULL,
product_id uuid NOT NULL,
locale varchar(20) NOT NULL,
title varchar(255) NOT NULL,
description text NULL,
username varchar(255) NOT NULL,
creation_time timestamp NOT NULL,
update_time timestamp NOT NULL,
short_description varchar(450) NULL,
"language" regconfig NOT NULL DEFAULT 'simple'::regconfig,
ts_title tsvector NULL GENERATED ALWAYS AS (to_tsvector(language, title::text)) STORED,
ts_description tsvector NULL GENERATED ALWAYS AS (to_tsvector(language, description)) STORED,
CONSTRAINT product_description_pkey PRIMARY KEY (id),
CONSTRAINT products_descriptions_product_id_locale_key UNIQUE (product_id, locale),
CONSTRAINT products_descriptions_product_id_fkey FOREIGN KEY (product_id) REFERENCES public.products(id) ON DELETE CASCADE
);
CREATE INDEX pd_description_idx ON public.products_descriptions USING pgroonga (description);
CREATE INDEX pd_title_idx ON public.products_descriptions USING pgroonga (title);
CREATE INDEX pd_ts_description_idx ON public.products_descriptions USING gin (ts_description);
CREATE INDEX pd_ts_title_idx ON public.products_descriptions USING gin (ts_title);
CREATE INDEX products_descriptions_id_idx ON public.products_descriptions USING btree (id);
CREATE INDEX products_descriptions_locale_idx ON public.products_descriptions USING btree (locale);
CREATE INDEX products_descriptions_product_id_idx ON public.products_descriptions USING btree (product_id);
products_contents table:
CREATE TABLE public.products_contents (
product_id uuid NOT NULL,
content_id uuid NOT NULL,
username varchar(255) NOT NULL,
creation_time timestamp NOT NULL,
CONSTRAINT products_contents_product_id_content_id_key UNIQUE (product_id, content_id),
CONSTRAINT products_contents_content_id_fkey FOREIGN KEY (content_id) REFERENCES public.contents(id) ON DELETE CASCADE,
CONSTRAINT products_contents_product_id_fkey FOREIGN KEY (product_id) REFERENCES public.products(id) ON DELETE CASCADE
);
CREATE INDEX products_contents_content_id_idx ON public.products_contents USING btree (content_id);
CREATE INDEX products_contents_product_id_idx ON public.products_contents USING btree (product_id);
contents table:
CREATE TABLE public.contents (
id uuid NOT NULL,
value varchar(100) NOT NULL,
locale varchar(30) NOT NULL,
username varchar(255) NOT NULL,
creation_time timestamp NOT NULL,
update_time timestamp NOT NULL,
"language" regconfig NOT NULL DEFAULT 'simple'::regconfig,
ts_value tsvector NULL GENERATED ALWAYS AS (to_tsvector(language, value::text)) STORED,
CONSTRAINT contents_pkey PRIMARY KEY (id),
CONSTRAINT contents_value_locale_key UNIQUE (value, locale)
);
CREATE INDEX c_ts_value_idx ON public.contents USING gin (ts_value);
CREATE INDEX c_value_idx ON public.contents USING pgroonga (value);
CREATE INDEX contents_id_idx ON public.contents USING btree (id);
CREATE INDEX contents_locale_idx ON public.contents USING btree (locale);
3
Answers
I've ended up merging all the values, in the union selects, into a single columns, which I'm using just for the search functionality. With this approach + some extra multi columns indexes, I've managed to get results back in less than a second.
Thanks all for your feedbacks!
Convert these to JOINs, and then do the WHERE on the joined sets. The advantage is this better preserves the source, rather than building up a working set, so you might also do better matching with an index for the final JOIN.
Note: I had to make some assumptions about which tables sourced which columns and how those columns fit together, meaning it’s possible you could further simplify this to reduce some repetition in the
WHERE
clause. It’s also possible you could convert some of theLEFT JOIN
s toINNER JOIN
s in cases where you know there will always be dependent data (ie: if every product will always have tags, that can be an INNER JOIN, and that might give you a performance boost).IMHO, you should test with different SQL to get the best average for you.ie: