Amazon web services - How to match these 2 tables?

Rinze
January 20, 2025
46 views
1 vote
2 Answers

Here is the first table A in Redshift:

team     origin_id     targte_id
a        1             11
b        2             22
b        NULL          33
c        5             55
c        NULL          66

The origin_id and target_id could be NULL, but they couldn’t both NULL in one row.

And the second table B:

team     origin_id     targte_id     content
a        1             11            aaa
a        1             11            bbb
b        NULL          22            xxx
c        5             NULL          zzz

I want to join these 2 tables and match the value of content to the table A. The match rule is: A.team_id=B.team_id; if the origin_id is not null, then A.origin_id=B.origin_id; if the origin_id is null, then A.target_id=B.target_id. And the row count of the final result should be the same as the table A.

The final result set I wanted is:

team     origin_id     targte_id     content
a        1             11            aaa
b        2             22            xxx
b        NULL          33            NULL
c        5             55            zzz
c        NULL          66            NULL

(For the first row team=a, it matches 2 row in table B; just choose any one in these matched rows, and it is OK.)

What’s the correct SQL to achieve this in Redshift? I try left join but the result looks like a Cartesian Product of these 2 tables.

Answers

- WinsonTanputraman
- January 20, 2025 at 8:31 am
- 0 votes
0
Left join is correct. However, in your example where

For the first row team=a, it matches 2 row in table B; just choose any one in these matched rows, and it is OK.

There is no way to "just choose any one", a JOIN operation will match both rows. It is up to you to de-dupe the result set afterwards.

For example, you can group by A.team_id, A.origin_id, A.target_id:
```
SELECT
  team_id,
  origin_id,
  target_id,
  MIN(content) any_content
FROM (
  SELECT
    A.team_id,
    A.origin_id,
    A.target_id,
    B.content
  FROM
    A LEFT JOIN B
  ON
    A.team_id = B.team_id AND
    COALESCE(A.origin_id, 'is_null') = COALESCE(B.origin_id, 'is_null') AND
    COALESCE(A.target_id, 'is_null') = COALESCE(B.target_id, 'is_null')
)
GROUP BY
  team_id,
  origin_id,
  target_id
```
Login or Signup to reply.

- AgnelSalve
- January 20, 2025 at 11:31 am
- 0 votes
0
The query joins Table A and Table B using a LEFT JOIN with conditional matching: origin_id is matched if not NULL, otherwise targte_id is matched. The ROW_NUMBER function ensures only one match is selected per row in Table A when multiple matches exist. If no match is found, the LEFT JOIN ensures NULL is retained for content. This guarantees the final result has the same number of rows as Table A, with a single matching content (or NULL) for each row.

below is a simple code you can try
```
    WITH RankedMatches AS (
    SELECT 
        A.team,
        A.origin_id,
        A.targte_id,
        B.content,
        ROW_NUMBER() OVER (
            PARTITION BY A.team, A.origin_id, A.targte_id
            ORDER BY B.content -- Arbitrary order; adjust as needed
        ) AS rn
    FROM 
        TableA A
    LEFT JOIN 
        TableB B
    ON 
        A.team = B.team
        AND (
            (A.origin_id IS NOT NULL AND A.origin_id = B.origin_id) OR
            (A.origin_id IS NULL AND A.targte_id = B.targte_id)
        )
)
SELECT
    team,
    origin_id,
    targte_id,
    CASE WHEN rn = 1 THEN content ELSE NULL END AS content
FROM 
    RankedMatches
WHERE 
    rn = 1 -- Ensures only one row per match
ORDER BY 
    team, origin_id, targte_id;
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Amazon web services – How to match these 2 tables?

Answers