Here is the first table A
in Redshift:
team origin_id targte_id
a 1 11
b 2 22
b NULL 33
c 5 55
c NULL 66
The origin_id
and target_id
could be NULL, but they couldn’t both NULL in one row.
And the second table B
:
team origin_id targte_id content
a 1 11 aaa
a 1 11 bbb
b NULL 22 xxx
c 5 NULL zzz
I want to join these 2 tables and match the value of content
to the table A
. The match rule is: A.team_id
=B.team_id
; if the origin_id
is not null, then A.origin_id
=B.origin_id
; if the origin_id
is null, then A.target_id
=B.target_id
. And the row count of the final result should be the same as the table A
.
The final result set I wanted is:
team origin_id targte_id content
a 1 11 aaa
b 2 22 xxx
b NULL 33 NULL
c 5 55 zzz
c NULL 66 NULL
(For the first row team=a
, it matches 2 row in table B
; just choose any one in these matched rows, and it is OK.)
What’s the correct SQL to achieve this in Redshift? I try left join
but the result looks like a Cartesian Product of these 2 tables.
2
Answers
Left join is correct. However, in your example where
There is no way to "just choose any one", a JOIN operation will match both rows. It is up to you to de-dupe the result set afterwards.
For example, you can group by A.team_id, A.origin_id, A.target_id:
The query joins Table A and Table B using a LEFT JOIN with conditional matching: origin_id is matched if not NULL, otherwise targte_id is matched. The ROW_NUMBER function ensures only one match is selected per row in Table A when multiple matches exist. If no match is found, the LEFT JOIN ensures NULL is retained for content. This guarantees the final result has the same number of rows as Table A, with a single matching content (or NULL) for each row.
below is a simple code you can try