skip to Main Content

I have a user table that contains a "skills" column which is a text array. Given some input array, I would like to find all the users whose skills % one or more of the entries in the input array, and order by number of matches (according to the % operator from pg_trgm).

For example, I have Array[‘java’, ‘ruby’, ‘postgres’] and I want users who have these skills ordered by the number of matches (max is 3 in this case).

I tried unnest() with an inner join. It looked like I was getting somewhere, but I still have no idea how I can capture the count of the matching array entries. Any ideas on what the structure of the query may look like?

Edit: Details:

Here is what my programmers table looks like:

id |            skills             
----+-------------------------------
  1 | {javascript,rails,css}
  2 | {java,"ruby on rails",adobe}
  3 | {typescript,nodejs,expressjs}
  4 | {auth0,c++,redis}

where skills is a text array.

Here is what I have so far:

SELECT * FROM programmers, unnest(skills) skill_array(x)
    INNER JOIN unnest(Array['ruby', 'node']) search(y)
    ON skill_array.x % search.y;

which outputs the following:

id |            skills             |       x       |    y    
----+-------------------------------+---------------+---------
  2 | {java,"ruby on rails",adobe}  | ruby on rails | ruby
  3 | {typescript,nodejs,expressjs} | nodejs        | node
  3 | {typescript,nodejs,expressjs} | expressjs     | express

*Assuming pg_trgm is enabled.

2

Answers


  1. Chosen as BEST ANSWER

    I accepted Edouard's answer, but I thought I'd show something else I adapted from it.

    CREATE OR REPLACE FUNCTION partial_and_and(list1 TEXT[], list2 TEXT[])
       RETURNS BOOLEAN AS $$
       SELECT EXISTS(
           SELECT * FROM unnest(list1) x, unnest(list2) y
           WHERE x % y
       );
    $$ LANGUAGE SQL IMMUTABLE;
    

    Then create the operator:

    CREATE OPERATOR &&% (
        LEFTARG = TEXT[],
        RIGHTARG = TEXT[],
        PROCEDURE = partial_and_and,
        COMMUTATOR = &&%
    );
    

    And finally, the query:

    SELECT p.id, p.skills, array_agg(t_skill) AS inter_skills
    FROM programmers AS p
    CROSS JOIN LATERAL unnest(Array['ruby', 'java']) AS t_skill
    WHERE p.skills &&% array[t_skill]
    GROUP BY p.id, p.skills
    ORDER BY array_length(inter_skills, 1) DESC;
    

    This will output an error saying column 'inter_skills' does not exist (not sure why), but oh well point is the query seems to work. All credit goes to Edouard.


  2. For an exact match between the user skills and the searched skills, you can proceed like this :

    1. You put the searched skills in the target_skills text array
    2. You filter the users from the table user_table whose user_skills array has at least one common element with the target_skills array by using the && operator
    3. For each of the selected users, you select the common skills by using unnest and INTERSECT, and you calculate the number of these common skills
    4. You order the result by the number of common skills DESC

    In this process, the users with skill "ruby" will be selected for the target skill "ruby", but not the users with skill "ruby on rails".

    This process can be implemented as follow :

    SELECT u.user_id
         , u.user_skills
         , inter.skills
    FROM user_table AS u
    CROSS JOIN LATERAL
        (   SELECT array(   SELECT unnest(u.user_skills)
                            INTERSECT
                            SELECT unnest(target_skills)
                        ) AS skills
        ) AS inter
    WHERE u.user_skills && target_skills
    ORDER BY array_length(inter.skills, 1) DESC
    

    or with this variant :

    SELECT u.user_id
         , u.user_skills
         , array_agg(t_skill) AS inter_skills
    FROM user_table AS u
    CROSS JOIN LATERAL unnest(target_skills) AS t_skill
    WHERE u.user_skills && array[t_skill]
    GROUP BY u.user_id, u.user_skills
    ORDER BY array_length(inter_skills, 1) DESC
    

    This query can be accelerated by creating a GIN index on the user_skills column of the user_table.

    For a partial match between the user skills and the target skills (ie the users with skill "ruby on rails" must be selected for the target skill "ruby"), you need to use the pattern matching operator LIKE or the regular expression, but it is not possible to use them with text arrays, so you need first to transform your user_skills text array into a simple text with the function array_to_string. The query becomes :

    SELECT u.user_id
         , u.user_skills
         , array_agg(t_skill) AS inter_skills
    FROM user_table AS u
    CROSS JOIN unnest(target_skills) AS t_skill
    WHERE array_to_string(u.user_skills, ' ') ~ t_skill
    GROUP BY u.user_id, u.user_skills
    ORDER BY array_length(inter_skills, 1) DESC ;
    

    Then you can accelerate the queries by creating the following GIN (or GiST) index :

    DROP INDEX IF EXISTS user_skills ;
    CREATE INDEX user_skills
      ON user_table
      USING gist (array_to_string(user_skills, ' ') gist_trgm_ops) ;    -- gin_trgm_ops and gist_trgm_ops indexes are compliant with the LIKE operator and the regular expressions
    

    In any case, managing the skills as text will ever fail if there are typing errors or if the skills list is not normalized.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search