skip to Main Content

In my Postgres 13 DB I want to update a column mother_tongues_id in my table based on name columns.

Names are like “Le Minh Thien” and I want to check if those contain words like ‘le’ or ‘lee’ then set nationality to ‘Vietnam’. The words check table is some thing like this:

language noun
vietnam +lee+le+long+la+
chinese +lu+zhu+

Here is my UPDATE code:

UPDATE resume
SET mother_tongues_id = (
    SELECT id FROM mother_tongues
    WHERE check_language(resume.id) = id
    limit 1);
WHERE mother_tongues_id IS NULL;

Here is my check_language() function:

CREATE OR REPLACE FUNCTION check_language(rid integer) RETURNS integer AS
$BODY$
    DECLARE i int = (SELECT (LENGTH(CONCAT(first_name, ' ',last_name)) - LENGTH(replace(CONCAT(first_name, ' ',last_name), ' ', ''))) FROM resume where id = rid);
    DECLARE sname varchar = '';
    DECLARE sid int = 0
    LOOP
        SET i = i - 1;
        SET sname = SELECT split_part(SELECT CONCAT(first_name, ' ',last_name) FROM resume  where id = rid, ' ', i)
        if select id from mother_tongues WHERE noun ILIKE '%+sname+%' then SET sid = id;
        end if;
        IF i = 1 THEN
            EXIT;
        END IF;
    END LOOP;
    if sid > 0 then RETURN sid;
    else RETURN NULL;
    end if;
END;

$BODY$ 
LANGUAGE plpgsql;

I try to create a function in Postgres with loop to split the name and check the words, however always error.

2

Answers


  1. You have some syntax errors, here is a fix, have a try again.

    CREATE OR REPLACE FUNCTION check_language(rid integer) RETURNS integer AS
    $BODY$
        DECLARE 
            i int = (SELECT (LENGTH(CONCAT(first_name, ' ',last_name)) - LENGTH(replace(CONCAT(first_name, ' ',last_name), ' ', ''))) FROM resume where id = rid);
            sname varchar(255) = '';
            sid int = 0;
        BEGIN
            LOOP
                SET i = i - 1;
                SET sname = (SELECT split_part(CONCAT(first_name, ' ',last_name), ' ', i) FROM resume  where id = rid);
                if select id from mother_tongues WHERE noun ILIKE '%+sname+%' then 
                    SET sid = id;
                end if;
                IF i = 1 THEN
                    EXIT;
                END IF;
            END LOOP;
            if sid > 0 then 
                RETURN sid;
            else 
                RETURN NULL;
            end if;
        END;
    $BODY$ 
    LANGUAGE plpgsql;
    
    UPDATE resume
    SET mother_tongues_id = (
        SELECT id FROM mother_tongues
        WHERE check_language(resume.id) = id
        limit 1
    )
    WHERE mother_tongues_id IS NULL;
    
    Login or Signup to reply.
  2. Your function is extremely inefficient, errors aside.
    You certainly don’t need a function for this to begin with. A plain UPDATE with a correlated subquery matching with the array overlap operator && does the job:

    UPDATE resume
    SET    mother_tongues_id = (
        SELECT m.id
        FROM   mother_tongues m
        WHERE  string_to_array(lower(concat_ws(' ', r.first_name, r.last_name)), ' ')
           &&  string_to_array(trim(m.noun, '+'), '+')
        LIMIT  1
        )
    WHERE  mother_tongues_id IS NULL;
    

    Of course, this is just putting lipstick on a pig. You should really start by fixing your DB schema. As a minimum, convert mother_tongues.noun to text[], so we don’t have to do it on the fly. And add a GIN index on that column. See:

    Or properly normalize the design.

    More fundamentally, deciding nationality based on typical names is a dubious approach. Besides being hardly valid, there can be any number of conflicts, like multiple matches. Currently, LIMIT 1 simply takes an arbitrary match in this case.

    If you really need to loop in a PL/pgSQL function (which you don’t in this case), read here:

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search