How to find records in PostgreSQL matching a combination of a pair of nullable String columns

MasaSakano
November 11, 2022
233 views
5 votes
5 Answers

Suppose a PostgreSQL table, articles, contains two nullable String columns of name and alt_name.
Now, I want to find records (rows) in the table that have

a combination of String name and alt_name matches another combination of the same type in the same table:
- i.e., [a.name, a.alt_name] is equal to either [b.name, b.alt_name] or [b.alt_name, b.name]
where name or alt_name may be NULL or an empty String, and in any circumstances NULL and an empty String should be treated as identical;
- e.g., when [a.name, a.alt_name] == ["abc", NULL], a record of [b.name, b.alt_name] == ["", "abc"] should match, because one of them is "abc" and the other is NULL or empty String.

Is there any neat query to achieve this?

I thought if there is a way to concatenate both columns with a UTF-8 replacement character (U+FFFD) in between, where NULL is converted into an empty String, that would solve the problem. Say, if the function were magic_fn(), the following would do a job, providing there is a unique column id:

SELECT * FROM articles a INNER JOIN places b ON a.id <> b.id
  WHERE
        magic_fn(a.name, a.alt_name) =  magic_fn(b.name, b.alt_name)
     OR magic_fn(a.name, a.alt_name) =  magic_fn(b.alt_name, b.name);


-- [EDIT] corrected from the original post, which was simply wrong.

However, ~~concatnation is not a built-in function in PostgreSQL~~ and I don’t know how to do this.
[EDIT] As commented by @Serg and in answers, a string-concatnation function is now available in PostgreSQL from Ver.9.1 (CONCAT or ||); n.b., it actually accepts non-String input as long as one of them is a String-type as of Ver.15.

Or, maybe there is simply a better way?

Answers

Chosen as BEST ANSWER
- MasaSakano
- November 12, 2022 at 7:31 pm
- 0 votes
0
Having reviewed a few answers (special thanks to @MitkoKeckaroski), I have come up with this short solution. COALESCE() is not necessary!

The condition is that the UTF replacement character (U+FFFD) should never appear in the data record, which you can safely assume according to the Unicode specification.
```
SELECT * FROM articles a JOIN articles b 
ON a.id <> b.id AND
  ARRAY[CONCAT(a.name, U&'FFFD', a.alt_name), 
        CONCAT(a.alt_name, U&'FFFD', a.name)] @>
  ARRAY[CONCAT(b.name, U&'FFFD', b.alt_name)];
```
See db<>fiddle (where I extended the data prepared by @Ajax1234 – thank you!)

(Edit)

- MitkoKeckaroski
- November 11, 2022 at 3:17 pm
- 0 votes
0
try this
```
SELECT  *   FROM articles a
cross join articles b    
where  
(ARRAY[COALESCE(a.name,''),COALESCE(a.alt_name,'')] @>  ARRAY[COALESCE(b.name,''),COALESCE(b.alt_name,'')])  
and (ARRAY[COALESCE(a.name,''),COALESCE(a.alt_name,'')] <@  ARRAY[COALESCE(b.name,''),COALESCE(b.alt_name,'')]) 
and a.id<>b.id
and a.id<b.id  --optional (to avoid reverse matching) 
```
db<>fiddle
Login or Signup to reply.

You can create a function which takes in the name and alt_name, then returns an aggregated string with nulls converted to empty strings and the results sorted:

create function magic_fn(a text, b text) returns text
  return (select json_agg(t.v) from (
    select t1.* from (
      select coalesce(a, '') v
      union all
      select coalesce(b, '') v) t1 
    order by t1.v) t);
create table articles (id int, name text, alt_name text);
insert into articles values (1, 'abc', null), (2, 'abc', ''), (3, null, 'abc'), (4, 'aaa', 'a'), (5, 'aaa', 'a'), (6, 'a', 'aaa')

Usage:

select * from articles a join articles b 
on a.id <> b.id and magic_fn(a.name, a.alt_name) = magic_fn(b.name, b.alt_name)

See fiddle

- MMAARR
- November 11, 2022 at 3:18 pm
- 0 votes
0
you can try to use
- coalesce for convert null to empty
- || for concatenate string
and then compare string like this sql:
```
(coalesce(a.name,'') || coalesce(a.altname,'')) =  (coalesce(b.name,'') || coalesce(b.altname,'')) 
 or 
 (coalesce(a.name,'') || coalesce(a.altname,'')) =  (coalesce(b.altname,'') || coalesce(b.name,'')) 
```
Login or Signup to reply.

You can create an array from both names, remove null and empty values, then check if the arrays overlap (have elements in common)

select *
from articles
where array_remove(array[nullif(name,''), nullif(alt_name,'')], null) && array['abc']

This can be made easier by creating a function that generates such an array:

create or replace function combine_names(p_names variadic text[]) 
  returns text[]
as
$$
  select array_agg(name)
  from unnest(p_names) as x(name)
  where nullif(trim(name),'') is not null;
$$ 
language sql
immutable
called on null input;

By making the parameter variadic it’s possible to provide a different number of arguments (in theory even more than two)

select *
from articles
where combine_names(name, alt_name) && combine_names('abc')


select *
from articles
where combine_names(name, alt_name) && combine_names('abc', null)


select *
from articles
where combine_names(name, alt_name) && combine_names('abc', 'def')

Please signup or login to give your own answer.

Click here to cancel reply.