skip to Main Content

My Postgres database was migrated from MySQL using a tool and the code base has lot of syntax issues.

One of the issues is with the UPDATE statements inside procedures where the column name contains alias name as below.

UPDATE table1 t1 SET t1.col1 = 'some_value';

Having alias name after SET keyword as in t1.col1 is a wrong syntax in Postgres.

As the number of procedures is huge, I’m trying to write a regular expression to find which procedures have this pattern.

select proname, prosrc from pg_proc 
where regexp_replace(prosrc, E'[\n\r]+', ' ', 'g' ) ~* '[:UPDATE:]([[:space:]]+)[:set:]([[:space:]]+)^[a-z]([a-z]|[0-9])+.^[a-z]([a-z]|[0-9])+([[:space:]]*)[:=:]';

The regexp_replace part on the left side of the condition is to remove line breaks which works fine. The main part on the right side is not returning the desired result.

I am trying to find the procedures that has UPDATE keyword followed by one or more space, followed by SET keyword, followed by one more space, followed by one more alphanumeric characters (which starts with an alphabet), followed by a dot(.) , followed by one more alphanumeric characters (which starts with an alphabet), followed by zero or more spaces, followed by an equal to sign (=).

But the statement I formed seems to be wrong. Any help on this is much appreciated.

5

Answers


  1. Chosen as BEST ANSWER

    The below query gives me the expected results. It checks for code where we have SET followed by one or more space, followed by one or more alphanumeric character and _, followed by a dot(.), followed by one or more alphanumeric character and _, followed by one or more spaces and followed by =. This fetches all the procedures that have the issue that I posted in question.

    select proname, prosrc from pg_proc 
    where regexp_replace(prosrc, E'[\n\r]+', ' ', 'g' ) 
    ~* '( SET)[[:space:]]+([a-z]|[0-9]|(_))+.([a-z]|[0-9]|(_))+[[:space:]]+(=)';
    

  2. Yes, in PostgreSQL this is not working:

    UPDATE table1 t1 SET t1.col1 = 'some_value';
    

    But, this is working correctly:

    UPDATE table1 t1 SET col1 = 'some_value';
    

    So we only need to clear the update field alias.
    Example for do it:

    with t1(txt) as (
        select 'UPDATE table1 t1 SET t1.col1 = some_value'
    )
    select regexp_replace(t1.txt, 'SET (.*)d.', 'SET ', 'g') from t1 
    
    Login or Signup to reply.
  3. For finding, selecting:

    with t1(txt) as (
        select 'UPDATE table1 t1 SET t1.col1 = some_value'
    )
    select * from t1 where t1.txt ~ 'SET (.*)d.'
    
    Login or Signup to reply.
  4. I think this may be more complex than you think… A procedure/function may have more than one update statement, and a simple regex will likely come up with many false positives.

    I think you want a function to do a better job of eliminating false positives that result from:

    • Alias that occurs after the update, in a separate statement (after the semicolon) — fix by splitting statements by semicolons
    • Aliases within the update that occur after a FROM or WHERE clause, which are valid and not syntax errors
    • Less frequent, aliases used in a CTE prior to the update – fix by ignoring everything prior to the update keyword

    Here is a boilerplate for what I think will get you close and minimize false positives:

    create or replace function find_bad_syntax()
    returns setof text
    language plpgsql as
    $BODY$
      DECLARE
        r pg_proc%rowtype;
        dml varchar[];
        eval varchar;
        alias varchar;
      BEGIN
        FOR r IN
          SELECT * FROM pg_proc WHERE prosrc ilike '%update%'
        LOOP
          dml := string_to_array (r.prosrc, ';');
          foreach eval in array dml
          loop
            alias := substring (lower (eval), 'update [w.]+s+(w+)');
          
            continue when alias is null or lower (alias) = 'set';
          
            eval := regexp_replace (eval, 'froms+.*', '', 'i');
            eval := regexp_replace (eval, 'wheres.*', '', 'i');
            eval := regexp_replace (eval, '^.*update', '', 'i');
            if eval ~* (alias || '.w+s*=') then
         -- if eval ~* (alias || '.w+s+=') then
              return next format ('PROC: %s ALIAS: %s => ERROR: %s', r.proname, alias, eval);
            end if;
          end loop;
        END LOOP;
      END;
    $BODY$
    

    So to get the results simply:

    select * from find_bad_syntax()
    

    I did a test run, and your function did show up in the results.

    Login or Signup to reply.
  5. Some small changes:

    with t1(txt) as (
        select 'UPDATE table1 t1 SET t1.col1 = some_value'
        union all 
        select 'UPDATE table1 tbp3232 SET tbp3232.col1 = some_value'
        union all 
        select 'select pp3.* from table1 pp3'
        union all 
        select 'UPDATE table1 SET col1 = some_value'
        union all 
        select 'UPDATE table1 t SET t.col1 = some_value'
    )
    select * from t1 where t1.txt ~ 'SET (.*)w.'
    
    
    --Result: 
    'UPDATE table1 t1 SET t1.col1 = some_value'
    'UPDATE table1 tbp3232 SET tbp3232.col1 = some_value'
    'UPDATE table1 t SET t.col1 = some_value'
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search