skip to Main Content

Suppose a PostgreSQL table, articles, contains two nullable String columns of name and alt_name.
Now, I want to find records (rows) in the table that have

  • a combination of String name and alt_name matches another combination of the same type in the same table:
    • i.e., [a.name, a.alt_name] is equal to either [b.name, b.alt_name] or [b.alt_name, b.name]
  • where name or alt_name may be NULL or an empty String, and in any circumstances NULL and an empty String should be treated as identical;
    • e.g., when [a.name, a.alt_name] == ["abc", NULL], a record of [b.name, b.alt_name] == ["", "abc"] should match, because one of them is "abc" and the other is NULL or empty String.

Is there any neat query to achieve this?

I thought if there is a way to concatenate both columns with a UTF-8 replacement character (U+FFFD) in between, where NULL is converted into an empty String, that would solve the problem. Say, if the function were magic_fn(), the following would do a job, providing there is a unique column id:

SELECT * FROM articles a INNER JOIN places b ON a.id <> b.id
  WHERE
        magic_fn(a.name, a.alt_name) =  magic_fn(b.name, b.alt_name)
     OR magic_fn(a.name, a.alt_name) =  magic_fn(b.alt_name, b.name);


-- [EDIT] corrected from the original post, which was simply wrong.

However, concatnation is not a built-in function in PostgreSQL and I don’t know how to do this.
[EDIT] As commented by @Serg and in answers, a string-concatnation function is now available in PostgreSQL from Ver.9.1 (CONCAT or ||); n.b., it actually accepts non-String input as long as one of them is a String-type as of Ver.15.

Or, maybe there is simply a better way?

5

Answers


  1. Chosen as BEST ANSWER

    Having reviewed a few answers (special thanks to @MitkoKeckaroski), I have come up with this short solution. COALESCE() is not necessary!

    The condition is that the UTF replacement character (U+FFFD) should never appear in the data record, which you can safely assume according to the Unicode specification.

    SELECT * FROM articles a JOIN articles b 
    ON a.id <> b.id AND
      ARRAY[CONCAT(a.name, U&'FFFD', a.alt_name), 
            CONCAT(a.alt_name, U&'FFFD', a.name)] @>
      ARRAY[CONCAT(b.name, U&'FFFD', b.alt_name)];
    

    See db<>fiddle (where I extended the data prepared by @Ajax1234 – thank you!)


  2. try this

    SELECT  *   FROM articles a
    cross join articles b    
    where  
    (ARRAY[COALESCE(a.name,''),COALESCE(a.alt_name,'')] @>  ARRAY[COALESCE(b.name,''),COALESCE(b.alt_name,'')])  
    and (ARRAY[COALESCE(a.name,''),COALESCE(a.alt_name,'')] <@  ARRAY[COALESCE(b.name,''),COALESCE(b.alt_name,'')]) 
    and a.id<>b.id
    and a.id<b.id  --optional (to avoid reverse matching) 
    

    db<>fiddle

    Login or Signup to reply.
  3. You can create a function which takes in the name and alt_name, then returns an aggregated string with nulls converted to empty strings and the results sorted:

    create function magic_fn(a text, b text) returns text
      return (select json_agg(t.v) from (
        select t1.* from (
          select coalesce(a, '') v
          union all
          select coalesce(b, '') v) t1 
        order by t1.v) t);
    create table articles (id int, name text, alt_name text);
    insert into articles values (1, 'abc', null), (2, 'abc', ''), (3, null, 'abc'), (4, 'aaa', 'a'), (5, 'aaa', 'a'), (6, 'a', 'aaa')
    

    Usage:

    select * from articles a join articles b 
    on a.id <> b.id and magic_fn(a.name, a.alt_name) = magic_fn(b.name, b.alt_name)
    

    See fiddle

    Login or Signup to reply.
  4. you can try to use

    • coalesce for convert null to empty
    • || for concatenate string

    and then compare string like this sql:

    (coalesce(a.name,'') || coalesce(a.altname,'')) =  (coalesce(b.name,'') || coalesce(b.altname,'')) 
     or 
     (coalesce(a.name,'') || coalesce(a.altname,'')) =  (coalesce(b.altname,'') || coalesce(b.name,'')) 
    
    Login or Signup to reply.
  5. You can create an array from both names, remove null and empty values, then check if the arrays overlap (have elements in common)

    select *
    from articles
    where array_remove(array[nullif(name,''), nullif(alt_name,'')], null) && array['abc']
    

    This can be made easier by creating a function that generates such an array:

    create or replace function combine_names(p_names variadic text[]) 
      returns text[]
    as
    $$
      select array_agg(name)
      from unnest(p_names) as x(name)
      where nullif(trim(name),'') is not null;
    $$ 
    language sql
    immutable
    called on null input;  
    

    By making the parameter variadic it’s possible to provide a different number of arguments (in theory even more than two)

    select *
    from articles
    where combine_names(name, alt_name) && combine_names('abc')
    
    
    select *
    from articles
    where combine_names(name, alt_name) && combine_names('abc', null)
    
    
    select *
    from articles
    where combine_names(name, alt_name) && combine_names('abc', 'def')
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search