Postgresql - Regex: Match group, multiple time and reset position

Iannick
March 15, 2024
193 views
0 votes
2 Answers

I would like to be able to return all canadian postalcode which can be find in using the following regex(migth have a space like: A0A 0A0 or A0A0A0)
regex [A-Z]d[A-Z]s*d[A-Z]d
The problem is when there is more than 1 match that a part of it it in .
I’ve played with capturing group (too much) and i’m more lost then at the start, oh well.

Given the string: SOMETEXTA0A 0A0OTHERSTUFF
- I would expect the following match
  - A0A 0A0 (ok)
Given the string: SOMETEXTB1A0A0A0OTHERSTUFFB2A 0B0OTHER
- I would expect the following match:
  - B1A0A0 (ok)
  - A0A0A0 (not found)
  - B2A 0B0 (not found)

Here is the regex101 https://regex101.com/r/ZbrwnP/1

Here is my test setup for postgresql.

with barcodes as(
    select * 
    from
        (values
         ('AAAC9B92WLEDH0G5R0Z0|2024-01-01T01:59:39.379-05')
         , ( 'SOMETEXTJ9H5G0L2J0OTHERSTUFF')
        )b(barcode)
)
select * 
from barcodes  b
left join lateral  (    
    SELECT UNNEST(
        REGEXP_MATCHES(
            b.barcode
            , '[A-Z]d[A-Z]s*d[A-Z]d'
            , 'g'
        )
    ) match
)t on true

Tags: postgresql regex

Answers

- Schwern
- March 13, 2024 at 8:50 pm
- 0 votes
0
In your example SOMETEXTB1A0A0A0OTHERSTUFFB2A 0B0OTHER…
- B1A0A0 is found.
- A0A0A0 is not found because it overlaps with B1A0A0.
- B2A 0B0 is found.
Normally one deals with overlaps by using a capture group inside a positive look ahead like (?=([A-Z]d[A-Z]s*d[A-Z]d)). Here is a demonstration. Unfortunately in PostgreSQL…

Lookahead and lookbehind constraints cannot contain back references (see Section 9.7.3.3), and all parentheses within them are considered non-capturing.

All you get is empty strings.

You can try installing pgpcre which gives you Perl Compatible Regular Expressions.

This may be something better done by another programming language. Retrieve the data, parse it, and (if you need to do this more than once) insert the parsed postal codes back into the database.
Login or Signup to reply.

- Nick
- March 15, 2024 at 6:04 am
- 0 votes
0
You can achieve the result you want directly in SQL, although it requires a bit of work. Rather than attempting to match the entire regex, we match on just the first character of the regex, with a lookahead assertion to match the balance of the regex. This ensures we find all locations in the string which can match the regex, whether they overlap or not. So we initially use this regex:
```
[A-Z](?=d[A-Z]s*d[A-Z]d)
```
We use this first in regexp_count, to see how many matches there are in the string, and use generate_series to generate a sequence of numbers which represent each individual match. We then use regexp_instr to find the starting position of each match in the string. Finally, we use regexp_substr with the starting positions and the original regex to extract the matches:
```
with barcodes as (
    select * 
    from
        (values
         ('SOMETEXTB1A0A0A0OTHERSTUFFB2A 0B0OTHER')
         , ( 'SOMETEXTJ9H5G0L2J0OTHERSTUFF')
         , ( 'SOMETEXTA0A 0A0OTHERSTUFF' )
        )b(barcode)
), counts as (
select *
from barcodes b
left join lateral (
  select generate_series as n
  from generate_series(1, regexp_count(barcode, '[A-Z](?=d[A-Z]s*d[A-Z]d)'))
) c on true
),
positions as (
select barcode, regexp_instr(barcode, '[A-Z](?=d[A-Z]s*d[A-Z]d)', 1, n) as start
from counts
)
select barcode, regexp_substr(barcode, '[A-Z]d[A-Z]s*d[A-Z]d', start) as postcode
from positions
```
Output:
```
barcode                                 postcode
SOMETEXTB1A0A0A0OTHERSTUFFB2A 0B0OTHER  B1A0A0
SOMETEXTB1A0A0A0OTHERSTUFFB2A 0B0OTHER  A0A0A0
SOMETEXTB1A0A0A0OTHERSTUFFB2A 0B0OTHER  B2A 0B0
SOMETEXTJ9H5G0L2J0OTHERSTUFF            J9H5G0
SOMETEXTJ9H5G0L2J0OTHERSTUFF            H5G0L2
SOMETEXTJ9H5G0L2J0OTHERSTUFF            G0L2J0
SOMETEXTA0A 0A0OTHERSTUFF               A0A 0A0
```
Note I’ve split out the steps individually to make it clear how this works, you could easily combine some of the steps to reduce the number of operations. You could also aggregate the results into an array if desired.

Demo on dbfiddle.uk
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – Regex: Match group, multiple time and reset position

Answers