Ex Table:
Name | Location |
---|---|
John Smith | NewYork1131413431 |
John Smith | NewYork2222213 |
Marc White | Boston111118174124712 |
Sarah Phillips | NewYork2222123 |
Sarah Phillips | Boston2222918 |
I’m trying to query only the names in which they exist in 2 or more cities (for the case of this dataset, it will almost always be 2 cities max). So from this table, only Sarah Philips should return.
I wrote a query with the ‘like NewYork% or like Boston%’ operators with having count(location) > 1. But doing that would return John Smith as well because the NewYork string has multiple variations. The characters after the city name are also random in length.
2
Answers
If the data is always in this form, meaning the location consists of a town followed by some digits, you can use
SUBSTRING
with a regex check that fetches the substring before the first digit.Then put this condition in the
HAVING
clause.This will as requested only return those people who occur in different locations, see this sample fiddle.
Note: It would be better to save the city and the rest of the string (the digits) in separate columns, this would make things easier.
Another note: You need to extend this solution if you have more possible cases to cover.
For example, if the town
NewYork
is sometimes written asNew York
, you likely want to treat both as the same town.Then you would add a
REPLACE
to get rid of the space and change the condition to...HAVING COUNT (DISTINCT SUBSTRING(REPLACE(Location,' ',''),'D+')) > 1
In case this is not sufficient and you need further assistance, please explain all possible cases you need to cover.
Hello another way to solve this in PostgreSql is by using Regexp_Peplace build in function:
This is one way with the help of having clause:
This is another way using Exists operator:
You may also check the fiddle created for this case.
https://dbfiddle.uk/_cIOdooQ
Hope it helps.