skip to Main Content

Here is the first table A in Redshift:

team     origin_id     targte_id
a        1             11
b        2             22
b        NULL          33
c        5             55
c        NULL          66

The origin_id and target_id could be NULL, but they couldn’t both NULL in one row.

And the second table B:

team     origin_id     targte_id     content
a        1             11            aaa
a        1             11            bbb
b        NULL          22            xxx
c        5             NULL          zzz

I want to join these 2 tables and match the value of content to the table A. The match rule is: A.team_id=B.team_id; if the origin_id is not null, then A.origin_id=B.origin_id; if the origin_id is null, then A.target_id=B.target_id. And the row count of the final result should be the same as the table A.

The final result set I wanted is:

team     origin_id     targte_id     content
a        1             11            aaa
b        2             22            xxx
b        NULL          33            NULL
c        5             55            zzz
c        NULL          66            NULL

(For the first row team=a, it matches 2 row in table B; just choose any one in these matched rows, and it is OK.)

What’s the correct SQL to achieve this in Redshift? I try left join but the result looks like a Cartesian Product of these 2 tables.

2

Answers


  1. Left join is correct. However, in your example where

    For the first row team=a, it matches 2 row in table B; just choose any one in these matched rows, and it is OK.

    There is no way to "just choose any one", a JOIN operation will match both rows. It is up to you to de-dupe the result set afterwards.

    For example, you can group by A.team_id, A.origin_id, A.target_id:

    SELECT
      team_id,
      origin_id,
      target_id,
      MIN(content) any_content
    FROM (
      SELECT
        A.team_id,
        A.origin_id,
        A.target_id,
        B.content
      FROM
        A LEFT JOIN B
      ON
        A.team_id = B.team_id AND
        COALESCE(A.origin_id, 'is_null') = COALESCE(B.origin_id, 'is_null') AND
        COALESCE(A.target_id, 'is_null') = COALESCE(B.target_id, 'is_null')
    )
    GROUP BY
      team_id,
      origin_id,
      target_id
    
    Login or Signup to reply.
  2. The query joins Table A and Table B using a LEFT JOIN with conditional matching: origin_id is matched if not NULL, otherwise targte_id is matched. The ROW_NUMBER function ensures only one match is selected per row in Table A when multiple matches exist. If no match is found, the LEFT JOIN ensures NULL is retained for content. This guarantees the final result has the same number of rows as Table A, with a single matching content (or NULL) for each row.

    below is a simple code you can try

        WITH RankedMatches AS (
        SELECT 
            A.team,
            A.origin_id,
            A.targte_id,
            B.content,
            ROW_NUMBER() OVER (
                PARTITION BY A.team, A.origin_id, A.targte_id
                ORDER BY B.content -- Arbitrary order; adjust as needed
            ) AS rn
        FROM 
            TableA A
        LEFT JOIN 
            TableB B
        ON 
            A.team = B.team
            AND (
                (A.origin_id IS NOT NULL AND A.origin_id = B.origin_id) OR
                (A.origin_id IS NULL AND A.targte_id = B.targte_id)
            )
    )
    SELECT
        team,
        origin_id,
        targte_id,
        CASE WHEN rn = 1 THEN content ELSE NULL END AS content
    FROM 
        RankedMatches
    WHERE 
        rn = 1 -- Ensures only one row per match
    ORDER BY 
        team, origin_id, targte_id;
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search