Postgresql - Postgres regular expression to find procedures with wrong Update syntax

Mano
September 13, 2022
137 views
5 votes
5 Answers

My Postgres database was migrated from MySQL using a tool and the code base has lot of syntax issues.

One of the issues is with the UPDATE statements inside procedures where the column name contains alias name as below.

UPDATE table1 t1 SET t1.col1 = 'some_value';

Having alias name after SET keyword as in t1.col1 is a wrong syntax in Postgres.

As the number of procedures is huge, I’m trying to write a regular expression to find which procedures have this pattern.

select proname, prosrc from pg_proc 
where regexp_replace(prosrc, E'[\n\r]+', ' ', 'g' ) ~* '[:UPDATE:]([[:space:]]+)[:set:]([[:space:]]+)^[a-z]([a-z]|[0-9])+.^[a-z]([a-z]|[0-9])+([[:space:]]*)[:=:]';

The regexp_replace part on the left side of the condition is to remove line breaks which works fine. The main part on the right side is not returning the desired result.

I am trying to find the procedures that has UPDATE keyword followed by one or more space, followed by SET keyword, followed by one more space, followed by one more alphanumeric characters (which starts with an alphabet), followed by a dot(.) , followed by one more alphanumeric characters (which starts with an alphabet), followed by zero or more spaces, followed by an equal to sign (=).

But the statement I formed seems to be wrong. Any help on this is much appreciated.

Tags: postgresql regex

Answers

Chosen as BEST ANSWER
- Mano
- September 14, 2022 at 1:50 pm
- 0 votes
0
The below query gives me the expected results. It checks for code where we have SET followed by one or more space, followed by one or more alphanumeric character and _, followed by a dot(.), followed by one or more alphanumeric character and _, followed by one or more spaces and followed by =. This fetches all the procedures that have the issue that I posted in question.
```
select proname, prosrc from pg_proc 
where regexp_replace(prosrc, E'[\n\r]+', ' ', 'g' ) 
~* '( SET)[[:space:]]+([a-z]|[0-9]|(_))+.([a-z]|[0-9]|(_))+[[:space:]]+(=)';
```

(Edit)

- RaminFaracov
- September 13, 2022 at 7:02 pm
- 0 votes
0
Yes, in PostgreSQL this is not working:
```
UPDATE table1 t1 SET t1.col1 = 'some_value';
```
But, this is working correctly:
```
UPDATE table1 t1 SET col1 = 'some_value';
```
So we only need to clear the update field alias.
Example for do it:
```
with t1(txt) as (
    select 'UPDATE table1 t1 SET t1.col1 = some_value'
)
select regexp_replace(t1.txt, 'SET (.*)d.', 'SET ', 'g') from t1 
```
Login or Signup to reply.

- RaminFaracov
- September 13, 2022 at 7:59 pm
- 0 votes
0
For finding, selecting:
```
with t1(txt) as (
    select 'UPDATE table1 t1 SET t1.col1 = some_value'
)
select * from t1 where t1.txt ~ 'SET (.*)d.'
```
Login or Signup to reply.

- Hambone
- September 13, 2022 at 8:33 pm
- 0 votes
0
I think this may be more complex than you think… A procedure/function may have more than one update statement, and a simple regex will likely come up with many false positives.

I think you want a function to do a better job of eliminating false positives that result from:
- Alias that occurs after the update, in a separate statement (after the semicolon) — fix by splitting statements by semicolons
- Aliases within the update that occur after a FROM or WHERE clause, which are valid and not syntax errors
- Less frequent, aliases used in a CTE prior to the update – fix by ignoring everything prior to the update keyword
Here is a boilerplate for what I think will get you close and minimize false positives:
```
create or replace function find_bad_syntax()
returns setof text
language plpgsql as
$BODY$
  DECLARE
    r pg_proc%rowtype;
    dml varchar[];
    eval varchar;
    alias varchar;
  BEGIN
    FOR r IN
      SELECT * FROM pg_proc WHERE prosrc ilike '%update%'
    LOOP
      dml := string_to_array (r.prosrc, ';');
      foreach eval in array dml
      loop
        alias := substring (lower (eval), 'update [w.]+s+(w+)');
      
        continue when alias is null or lower (alias) = 'set';
      
        eval := regexp_replace (eval, 'froms+.*', '', 'i');
        eval := regexp_replace (eval, 'wheres.*', '', 'i');
        eval := regexp_replace (eval, '^.*update', '', 'i');
        if eval ~* (alias || '.w+s*=') then
     -- if eval ~* (alias || '.w+s+=') then
          return next format ('PROC: %s ALIAS: %s => ERROR: %s', r.proname, alias, eval);
        end if;
      end loop;
    END LOOP;
  END;
$BODY$
```
So to get the results simply:
```
select * from find_bad_syntax()
```
I did a test run, and your function did show up in the results.
Login or Signup to reply.

Some small changes:

with t1(txt) as (
    select 'UPDATE table1 t1 SET t1.col1 = some_value'
    union all 
    select 'UPDATE table1 tbp3232 SET tbp3232.col1 = some_value'
    union all 
    select 'select pp3.* from table1 pp3'
    union all 
    select 'UPDATE table1 SET col1 = some_value'
    union all 
    select 'UPDATE table1 t SET t.col1 = some_value'
)
select * from t1 where t1.txt ~ 'SET (.*)w.'


--Result: 
'UPDATE table1 t1 SET t1.col1 = some_value'
'UPDATE table1 tbp3232 SET tbp3232.col1 = some_value'
'UPDATE table1 t SET t.col1 = some_value'

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – Postgres regular expression to find procedures with wrong Update syntax

Answers