I’m building a language learning platform, here are some tables:
create table users
(
id bigserial primary key,
...
)
create table users_languages
(
id bigserial primary key,
user_id bigint not null
constraint users_languages_user_id_foreign references users,
level varchar(255) not null,
lang_code varchar(10) not null,
...
)
So the users_languages
keeps all the langs a user knows, level
could be NATIVE
or LEARN
, lang_code
is the ISO code.
I’m building a recommended search feature for a current user, depending on the languages he knows, so the result must be ordered by the following rules:
- other user NATIVE lang = current user LEARN lang AND other user LEARN lang = current user NATIVE lang
- other user NATIVE lang = current user LEARN lang
- other user has at least one of current user langs
Here is what I figured out so far:
nativeCode := current user native lang_code
langCodes := current user learn lang_codes
WITH user_lang_priority AS NOT MATERIALIZED (
SELECT l.user_id, MIN(CASE
WHEN l.level = 'NATIVE' AND l.lang_code IN(:langCodes) THEN
CASE
WHEN EXISTS (
SELECT ll.id FROM users_languages ll
WHERE ll.level != 'NATIVE'
AND ll.lang_code = :nativeCode
AND ll.user_id = l.user_id
) THEN 1 ELSE 2
END
WHEN l.lang_code IN(:langCodes) THEN 3
END) AS priority
FROM users_languages l
GROUP BY l.user_id
)
SELECT u.*
FROM users u
INNER JOIN user_lang_priority lp ON u.id = lp.user_id
GROUP BY u.id
ORDER BY lp.priority ASC, u.id DESC
The query seems to be returning the correct results, but it takes up to 2 seconds on a table with about 30k rows. Is there a way to speed it up?
UPD: the query plan
Sort (cost=777435.38..777508.24 rows=29145 width=762)
" Sort Key: (min(CASE WHEN (((l.level)::text = 'NATIVE'::text) AND ((l.lang_code)::text = ANY ('{fin,fre,ger}'::text[]))) THEN CASE WHEN (hashed SubPlan 2) THEN 1 ELSE 2 END WHEN ((l.lang_code)::text = ANY ('{rus,fre,ger}'::text[])) THEN 3 ELSE NULL::integer END)), u.id DESC"
-> Hash Join (cost=1677.14..775274.14 rows=29145 width=762)
Hash Cond: (l.user_id = u.id)
-> GroupAggregate (cost=0.29..773229.32 rows=29145 width=12)
Group Key: l.user_id
-> Index Scan using users_languages_user_id_index on users_languages l (cost=0.29..3005.29 rows=84970 width=19)
SubPlan 2
-> Bitmap Heap Scan on users_languages ll (cost=299.15..1487.36 rows=14990 width=8)
Recheck Cond: ((lang_code)::text = 'eng'::text)
Filter: ((level)::text <> 'NATIVE'::text)
-> Bitmap Index Scan on users_languages_lang_code_index (cost=0.00..295.40 rows=22814 width=0)
Index Cond: ((lang_code)::text = 'eng'::text)
-> Hash (cost=1253.60..1253.60 rows=33860 width=758)
-> Seq Scan on users u (cost=0.00..1253.60 rows=33860 width=758)
3
Answers
Consider creating indexes on the following columns:
users_languages.user_id
: This should improve the join operation with the users table.users_languages.level
: To optimise the CASE statements.users_languages.lang_code
: to optimise the filtering operation and CASE statements.After creating these indexes, re-run your query and check the performance. You can update the statistics using the ANALYZE command:
Lastly, if the users_languages table is very large and this query is frequently executed, you might consider using a materialized view to precompute the results.
Depending on how frequently the underlying data changes, you’ll need to schedule regular refreshes of the materialized view.
try following query(view):
If you need search for current user, possible simple query. Here one join and one groupping. I think, they are well supported by indexes.
For test data
Draft result is
Query for all users
Result
fiddle here