skip to Main Content

Ex Table:

Name Location
John Smith NewYork1131413431
John Smith NewYork2222213
Marc White Boston111118174124712
Sarah Phillips NewYork2222123
Sarah Phillips Boston2222918

I’m trying to query only the names in which they exist in 2 or more cities (for the case of this dataset, it will almost always be 2 cities max). So from this table, only Sarah Philips should return.

I wrote a query with the ‘like NewYork% or like Boston%’ operators with having count(location) > 1. But doing that would return John Smith as well because the NewYork string has multiple variations. The characters after the city name are also random in length.

2

Answers


  1. If the data is always in this form, meaning the location consists of a town followed by some digits, you can use SUBSTRING with a regex check that fetches the substring before the first digit.

    Then put this condition in the HAVING clause.

    SELECT
      name
    FROM yourtable
    GROUP BY
      name
    HAVING 
      COUNT(DISTINCT SUBSTRING(Location,'D+')) > 1;
    

    This will as requested only return those people who occur in different locations, see this sample fiddle.

    Note: It would be better to save the city and the rest of the string (the digits) in separate columns, this would make things easier.

    Another note: You need to extend this solution if you have more possible cases to cover.

    For example, if the town NewYork is sometimes written as New York, you likely want to treat both as the same town.

    Then you would add a REPLACE to get rid of the space and change the condition to ...HAVING COUNT (DISTINCT SUBSTRING(REPLACE(Location,' ',''),'D+')) > 1

    In case this is not sufficient and you need further assistance, please explain all possible cases you need to cover.

    Login or Signup to reply.
  2. Hello another way to solve this in PostgreSql is by using Regexp_Peplace build in function:

    This is one way with the help of having clause:

    with data as (SELECT
        name
    , regexp_replace(location, '[0-9]', '', 'g') AS location
    FROM
        people_location
      )
    select name 
      from data
    group by name
    having min(location)<>max(location);
    

    This is another way using Exists operator:

    with data as (SELECT
        name
    , regexp_replace(location, '[0-9]', '', 'g') AS location
    FROM
        people_location
      )
    select distinct name 
    from data pl
    where exists (select * from data where pl.name=name and pl.location<>location);
    

    You may also check the fiddle created for this case.
    https://dbfiddle.uk/_cIOdooQ

    Hope it helps.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search