skip to Main Content

I am working on a project for migrating database from Informix to Postgresql and not able to find a replacement of "MATCHES" in Postgresql. Below is the sql query which I am trying to make work in Postgresql.

SELECT DISTINCT ca.cust_id, a.acct_id FROM cust_acct ca 
JOIN acct a on a.acct_id = ca.acct_id  
JOIN cust_address cas on ca.cust_id = cas.cust_id 
JOIN cust c on ca.cust_id = c.cust_id 
JOIN phone cp on c.cust_id = cp.cust_id 
WHERE a.recog_id MATCHES '*2972*'  
AND cas.last_name MATCHES '*[Q-T]' 
AND cas.frst_name MATCHES '[Uu]*' 
AND cas.city MATCHES '*X'
AND cas.county MATCHES '*'
AND cp.number MATCHES '*54123'

PROBLEM
I don’t find an expression in Postgresql which I can use to replace "MATCHES" and do no change in java code. Although here, https://www.postgresql.org/docs/current/functions-matching.html#:~:text=9.7.2.%C2%A0SIMILAR%20TO%20Regular%20Expressions%20%23the documentation says that SIMILAR TO uses POSIX regular expressions, but I don’t see that happening. Even if I replace ‘*’ with ‘%’ , it doesn’t give the same result in Postgresql:

SELECT DISTINCT ca.cust_id, a.acct_id FROM cust_acct ca 
    JOIN acct a on a.acct_id = ca.acct_id  
    JOIN cust_address cas on ca.cust_id = cas.cust_id 
    JOIN cust c on ca.cust_id = c.cust_id 
    JOIN phone cp on c.cust_id = cp.cust_id 
    WHERE a.recog_id LIKE '%2972%'  
    AND cas.last_name LIKE '%[Q-T]' 
    AND cas.frst_name LIKE '[Uu]%' 
    AND cas.city LIKE '%X'
    AND cas.county LIKE '%'
    AND cp.number LIKE '%54123'

I have tried with "SIMILAR TO" and still doesn’t give the expected result.

QUESTIONS
From https://www.ibm.com/docs/en/informix-servers/14.10?topic=condition-matches-operator
MATCHES '*[Q-T]' -> condition is TRUE for any name that ends with the letters from Q to T. How can I do this in Postgresql?
Is ‘%’ the accepted wildcard in postgresql? and is it be safe to replace all ‘*’ (accepted wildcard for informix) to ‘%’ for regex to work ?

Already tried regex_matches() and that doesn’t work either.

2

Answers


  1. You can do a nearly 1-to-1 translation between matches and similar to, but most of the time like is sufficient. You only need to use similar to for character classes.

    matches similar to like (PostgreSQL)
    * % %
    ? _ _
    [ … ] [ … ]
    [^ … ] [^ … ]
    (however, some combinations are special like n)

    Because similar to adds additional POSIX regex semantics, use like if all you need is wildcards.

    The translation of your example would be…

    WHERE a.recog_id like '%2972%'
    AND cas.last_name similar to '%[Q-T]' 
    AND cas.frst_name similar to '[Uu]%' 
    AND cas.city like '%X'
    -- this is basically `AND cas.county is not null`
    AND cas.county like '%'
    AND cp.number like '%54123'
    
    Login or Signup to reply.
  2. See here:

    If you can, use LIKE (~~), it’s fastest and simplest.

    The double tilde (~~) operator is a PostgreSQL non-standard extension.

    I never use SIMILAR TO.

    (good enough for me), and just to really ram it home:

    More importantly, don’t use SIMILAR TO at all. It’s completely pointless:

    To check the difference between LIKE (ordinary simple match) and ~ (a regexp match) (see the fiddle here):

    CREATE TABLE string
    (
      id    INTEGER NOT NULL GENERATED ALWAYS AS IDENTITY,
      str   TEXT,   -- str to be tested
      test  TEXT,   -- applicable test
      descr TEXT    -- why it is/isn't picked up
    );
    

    populate with some strings:

    INSERT INTO string (str, test, descr) VALUES
    ('adfadfs', 'Both', 'No match to any pattern'),        -- should not be picked up - doesn't match any pattern
    ('asdfd2972sdfas', 'LIKE', '"2972" matches in middle of str' ),
    ('afadsfasfQ', 'SIM TO', 'Match to capital "Q" at end of str' ),
    ('adfadsfdfsT', 'SIM TO', 'picked up - match to capital "T" at end of str'),
    ('Uasdfasdf', 'SIM TO', 'picked up - match to capital "U" as start of str'),
    ('uadsfasdf', 'Both', 'No pattern matches just lower "u" at start or other chars'), 
    ('fadsfdsX', 'LIKE', 'Matches "X" at end of str'),  
    ('blah', 'Both', 'Not picked up - just noise'),
    (NULL, 'No test', 'If IF NOT NULL was present, every str (other than this) would be picked up!'),             
    ('afad54123', 'LIKE', '"54123" at end of str'),
    ('asdf54123fadsf', 'Both', 'Not be picked up - "54123" in middle, and not end, of str');
    

    and then we run the tests – PostgreSQL version first:

    SELECT  -- could have put "~~" instead of LIKE
      *
    FROM 
      string
    WHERE str LIKE '%2972%'
       OR str ~    '^.*[Q-T]$' 
       OR str ~    '^[U].*$'  
       OR str LIKE '%X'
       OR str LIKE '%54123';
    

    Result:

    id str            test    descr
    2  asdfd2972sdfas LIKE    "2972" matches in middle of str
    3  afadsfasfQ     SIM TO  Match to capital "Q" at end of str
    4  adfadsfdfsT    SIM TO  Picked up - match to capital "T" at end of str
    5  Uasdfasdf      SIM TO  Picked up - match to capital "U" as start of str
    7  fadsfdsX       LIKE    Matches "X" at end of str
    10 afad54123      LIKE    "54123" at end of str
    

    Run the same again with SIMILAR TO in place of the PostgreSQL regexp matcher:

    SELECT
      *
    FROM 
      string
    WHERE str LIKE       '%2972%'
       OR str SIMILAR TO '%[Q-T]' 
       OR str SIMILAR TO '[U]%'  
       OR str LIKE       '%X'
       OR str LIKE '%54123';
    

    Result: Same.

    However, the Difference is seen when we run:

    EXPLAIN (ANALYZE, BUFFERS, VERBOSE, TIMING, SETTINGS)
    

    For the PostgreSQL standard query, we have (entire result not shown for brevity):

    Filter: ((string.str ~~ '%2972%'::text) OR (string.str ~ '^.*[Q-T]$'::text) OR (string.str ~ '^[U].*$'::text) OR (string.str ~~ '%X'::text) OR (string.str ~~ '%54123'::text))
    

    and the same line for the SIMILAR TO query is:

    Filter: ((string.str ~~ '%2972%'::text) OR (string.str ~ '^(?:.*[Q-T])$'::text) OR (string.str ~ '^(?:[U].*)$'::text) OR (string.str ~~ '%X'::text) OR (string.str ~~ '%54123'::text))
    

    Note:

    • string.str ~ '^.*[Q-T]$'::text != string.str ~ '^(?:.*[Q-T])$'::text

    and

    • string.str ~ '^[U].*$'::text != string.str ~ '^(?:[U].*)$'::text

    We have a more complex regex for the SIMILAR TO constructs as compared with the ~ (native PostgreSQL) ones. I don’t know about you, but for me more complex = more resources and/or time – KISS! The extra non-capturing group may not be a huge imposition, but it is worth noting.

    This was pointed out by @ErwinBrandstetter here

    Postgres only includes it to comply to the (weird) SQL standard.
    Inspect the output of EXPLAIN ANALYZE for your query and you will find
    that SIMILAR TO has been replaced by a regular expression.

    and:

    Internally every SIMILAR TO expression is rewritten to a regular
    expression. Consequently, for each and every SIMILAR TO expression
    there is at least one regular expression match that is a bit faster.

    So, migrations are not (for better or worse) simply a question of cut’n’paste. You’ll have to get to know, (appreciate even?), the architecture(s) of your source and target – if it were that simple, we wouldn’t all be making the big bucks! 🙂

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search