I have a user table that contains a "skills" column which is a text array. Given some input array, I would like to find all the users whose skills % one or more of the entries in the input array, and order by number of matches (according to the % operator from pg_trgm).
For example, I have Array[‘java’, ‘ruby’, ‘postgres’] and I want users who have these skills ordered by the number of matches (max is 3 in this case).
I tried unnest()
with an inner join
. It looked like I was getting somewhere, but I still have no idea how I can capture the count of the matching array entries. Any ideas on what the structure of the query may look like?
Edit: Details:
Here is what my programmers
table looks like:
id | skills
----+-------------------------------
1 | {javascript,rails,css}
2 | {java,"ruby on rails",adobe}
3 | {typescript,nodejs,expressjs}
4 | {auth0,c++,redis}
where skills
is a text array.
Here is what I have so far:
SELECT * FROM programmers, unnest(skills) skill_array(x)
INNER JOIN unnest(Array['ruby', 'node']) search(y)
ON skill_array.x % search.y;
which outputs the following:
id | skills | x | y
----+-------------------------------+---------------+---------
2 | {java,"ruby on rails",adobe} | ruby on rails | ruby
3 | {typescript,nodejs,expressjs} | nodejs | node
3 | {typescript,nodejs,expressjs} | expressjs | express
*Assuming pg_trgm is enabled.
2
Answers
I accepted Edouard's answer, but I thought I'd show something else I adapted from it.
Then create the operator:
And finally, the query:
This will output an error saying column 'inter_skills' does not exist (not sure why), but oh well point is the query seems to work. All credit goes to Edouard.
For an exact match between the user skills and the searched skills, you can proceed like this :
&&
operatorunnest
andINTERSECT
, and you calculate the number of these common skillsDESC
In this process, the users with skill "ruby" will be selected for the target skill "ruby", but not the users with skill "ruby on rails".
This process can be implemented as follow :
or with this variant :
This query can be accelerated by creating a GIN index on the user_skills column of the user_table.
For a partial match between the user skills and the target skills (ie the users with skill "ruby on rails" must be selected for the target skill "ruby"), you need to use the pattern matching operator
LIKE
or theregular expression
, but it is not possible to use them with text arrays, so you need first to transform your user_skills text array into a simple text with the functionarray_to_string
. The query becomes :Then you can accelerate the queries by creating the following
GIN
(orGiST
) index :In any case, managing the skills as text will ever fail if there are typing errors or if the skills list is not normalized.