Postgresql - Extract all substrings bounded by the same characters - PhpOut

jmintz4
August 21, 2022
288 views
3 votes
3 Answers

Given a name_loc column of text like the following:

{"Charlie – White Plains, NY","Wrigley – Minneapolis, MN","Ana – Decatur, GA"}

I’m trying to extract the names, ideally separated by commas:

Charlie, Wrigley, Ana

I’ve gotten this far:

SELECT SUBSTRING(CAST(name_loc AS VARCHAR) from '"([^ –]+)')
FROM table;

which returns

Charlie

How can I extend this query to extract all names?

Tags: postgresql regex substring

Answers

- Marth
- August 21, 2022 at 4:45 pm
- 0 votes
0
You can do this with a combination of regexp_matches (to extract the names), array_agg (to regroup all matches in a row) and array_to_string (to format the array as you’d like, e.g. with a comma separator):
```
WITH input(name_loc) AS (
    VALUES ('{"Charlie – White Plains, NY","Wrigley – Minneapolis, MN","Ana – Decatur, GA"}')
         , ('{"Other - somewhere}') -- added this to show multiple rows do not get merged
)
SELECT array_to_string(names, ', ')
FROM input
CROSS JOIN LATERAL (
    SELECT array_agg(name)
    FROM regexp_matches(name_loc, '"(w+)', 'g') AS f(name)
) AS f(names);
```
array_to_string

Charlie, Wrigley, Ana

Other

View on DB Fiddle
Login or Signup to reply.

- JvdV
- August 21, 2022 at 5:44 pm
- 0 votes
0
My two cents, though I’m rather new to postgreSQL and I had to copy the 1st piece from @Marth’s his answer:
```
WITH input(name_loc) AS (
    VALUES ('{"Charlie – White Plains, NY","Wrigley – Minneapolis, MN","Ana – Decatur, GA"}')
         , ('{"Other - somewhere"}') 
)
SELECT REGEXP_REPLACE(name_loc, '{?(,)?"(w+)[^"]+"}?','12', 'g') FROM input;
```
regexp_replace

Charlie,Wrigley,Ana

Other
Login or Signup to reply.

- ErwinBrandstetter
- August 22, 2022 at 3:18 am
- 0 votes
0
Your string literal happens to be a valid array literal.
(Maybe not by coincidence? And the column should be type text[] to begin with?)

If that’s the reliable format, there is a safe and simple solution:
```
SELECT t.id, x.names
FROM   tbl t
CROSS  JOIN LATERAL (
   SELECT string_agg(split_part(elem, ' – ', 1), ', ') AS names
   FROM   unnest(t.name_loc::text[]) elem
   ) x;
```
Or:
```
SELECT id, string_agg(split_part(elem, ' – ', 1), ', ') AS names
FROM   (SELECT id, unnest(name_loc::text[]) AS elem FROM tbl) t 
GROUP  BY id;
```
db<>fiddle here

Steps
1. Unnest the array with unnest() in a LATERAL CROSS JOIN, or directly in the SELECT list.
  - What is the difference between LATERAL JOIN and a subquery in PostgreSQL?
2. Take the first part with split_part(). I chose ' – ' as delimiter, not just ' ', to allow for names with nested space like "Anne Nicole". See:
  - Split comma separated column data into additional columns
3. Aggregate results with string_agg(). I added no particular order as you didn’t specify one.
  - Concatenate multiple result rows of one column into one, group by another column
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.