Suppose a PostgreSQL table, articles
, contains two nullable String columns of name
and alt_name
.
Now, I want to find records (rows) in the table that have
- a combination of String
name
andalt_name
matches another combination of the same type in the same table:- i.e.,
[a.name, a.alt_name]
is equal to either[b.name, b.alt_name]
or[b.alt_name, b.name]
- i.e.,
- where
name
oralt_name
may beNULL
or an empty String, and in any circumstancesNULL
and an empty String should be treated as identical;- e.g., when
[a.name, a.alt_name] == ["abc", NULL]
, a record of[b.name, b.alt_name] == ["", "abc"]
should match, because one of them is"abc"
and the other is NULL or empty String.
- e.g., when
Is there any neat query to achieve this?
I thought if there is a way to concatenate both columns with a UTF-8 replacement character (U+FFFD
) in between, where NULL is converted into an empty String, that would solve the problem. Say, if the function were magic_fn()
, the following would do a job, providing there is a unique column id
:
SELECT * FROM articles a INNER JOIN places b ON a.id <> b.id
WHERE
magic_fn(a.name, a.alt_name) = magic_fn(b.name, b.alt_name)
OR magic_fn(a.name, a.alt_name) = magic_fn(b.alt_name, b.name);
-- [EDIT] corrected from the original post, which was simply wrong.
However, concatnation is not a built-in function in PostgreSQL and I don’t know how to do this.
[EDIT] As commented by @Serg and in answers, a string-concatnation function is now available in PostgreSQL from Ver.9.1 (CONCAT or ||); n.b., it actually accepts non-String input as long as one of them is a String-type as of Ver.15.
Or, maybe there is simply a better way?
5
Answers
Having reviewed a few answers (special thanks to @MitkoKeckaroski), I have come up with this short solution. COALESCE() is not necessary!
The condition is that the UTF replacement character (
U+FFFD
) should never appear in the data record, which you can safely assume according to the Unicode specification.See db<>fiddle (where I extended the data prepared by @Ajax1234 – thank you!)
try this
db<>fiddle
You can create a function which takes in the
name
andalt_name
, then returns an aggregated string withnull
s converted to empty strings and the results sorted:Usage:
See fiddle
you can try to use
and then compare string like this sql:
You can create an array from both names, remove null and empty values, then check if the arrays overlap (have elements in common)
This can be made easier by creating a function that generates such an array:
By making the parameter
variadic
it’s possible to provide a different number of arguments (in theory even more than two)