skip to Main Content

I would like to be able to return all canadian postalcode which can be find in using the following regex(migth have a space like: A0A 0A0 or A0A0A0)
regex [A-Z]d[A-Z]s*d[A-Z]d
The problem is when there is more than 1 match that a part of it it in .
I’ve played with capturing group (too much) and i’m more lost then at the start, oh well.

  • Given the string: SOMETEXTA0A 0A0OTHERSTUFF

    • I would expect the following match
      • A0A 0A0 (ok)
  • Given the string: SOMETEXTB1A0A0A0OTHERSTUFFB2A 0B0OTHER

    • I would expect the following match:
      • B1A0A0 (ok)
      • A0A0A0 (not found)
      • B2A 0B0 (not found)

Here is the regex101 https://regex101.com/r/ZbrwnP/1

Here is my test setup for postgresql.

with barcodes as(
    select * 
    from
        (values
         ('AAAC9B92WLEDH0G5R0Z0|2024-01-01T01:59:39.379-05')
         , ( 'SOMETEXTJ9H5G0L2J0OTHERSTUFF')
        )b(barcode)
)
select * 
from barcodes  b
left join lateral  (    
    SELECT UNNEST(
        REGEXP_MATCHES(
            b.barcode
            , '[A-Z]d[A-Z]s*d[A-Z]d'
            , 'g'
        )
    ) match
)t on true

2

Answers


  1. In your example SOMETEXTB1A0A0A0OTHERSTUFFB2A 0B0OTHER

    • B1A0A0 is found.
    • A0A0A0 is not found because it overlaps with B1A0A0.
    • B2A 0B0 is found.

    Normally one deals with overlaps by using a capture group inside a positive look ahead like (?=([A-Z]d[A-Z]s*d[A-Z]d)). Here is a demonstration. Unfortunately in PostgreSQL

    Lookahead and lookbehind constraints cannot contain back references (see Section 9.7.3.3), and all parentheses within them are considered non-capturing.

    All you get is empty strings.

    You can try installing pgpcre which gives you Perl Compatible Regular Expressions.

    This may be something better done by another programming language. Retrieve the data, parse it, and (if you need to do this more than once) insert the parsed postal codes back into the database.

    Login or Signup to reply.
  2. You can achieve the result you want directly in SQL, although it requires a bit of work. Rather than attempting to match the entire regex, we match on just the first character of the regex, with a lookahead assertion to match the balance of the regex. This ensures we find all locations in the string which can match the regex, whether they overlap or not. So we initially use this regex:

    [A-Z](?=d[A-Z]s*d[A-Z]d)
    

    We use this first in regexp_count, to see how many matches there are in the string, and use generate_series to generate a sequence of numbers which represent each individual match. We then use regexp_instr to find the starting position of each match in the string. Finally, we use regexp_substr with the starting positions and the original regex to extract the matches:

    with barcodes as (
        select * 
        from
            (values
             ('SOMETEXTB1A0A0A0OTHERSTUFFB2A 0B0OTHER')
             , ( 'SOMETEXTJ9H5G0L2J0OTHERSTUFF')
             , ( 'SOMETEXTA0A 0A0OTHERSTUFF' )
            )b(barcode)
    ), counts as (
    select *
    from barcodes b
    left join lateral (
      select generate_series as n
      from generate_series(1, regexp_count(barcode, '[A-Z](?=d[A-Z]s*d[A-Z]d)'))
    ) c on true
    ),
    positions as (
    select barcode, regexp_instr(barcode, '[A-Z](?=d[A-Z]s*d[A-Z]d)', 1, n) as start
    from counts
    )
    select barcode, regexp_substr(barcode, '[A-Z]d[A-Z]s*d[A-Z]d', start) as postcode
    from positions
    

    Output:

    barcode                                 postcode
    SOMETEXTB1A0A0A0OTHERSTUFFB2A 0B0OTHER  B1A0A0
    SOMETEXTB1A0A0A0OTHERSTUFFB2A 0B0OTHER  A0A0A0
    SOMETEXTB1A0A0A0OTHERSTUFFB2A 0B0OTHER  B2A 0B0
    SOMETEXTJ9H5G0L2J0OTHERSTUFF            J9H5G0
    SOMETEXTJ9H5G0L2J0OTHERSTUFF            H5G0L2
    SOMETEXTJ9H5G0L2J0OTHERSTUFF            G0L2J0
    SOMETEXTA0A 0A0OTHERSTUFF               A0A 0A0
    

    Note I’ve split out the steps individually to make it clear how this works, you could easily combine some of the steps to reduce the number of operations. You could also aggregate the results into an array if desired.

    Demo on dbfiddle.uk

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search