skip to Main Content

I know similar Qs have been answered before here, but I’m unable to get the desired output based on answers I’ve reviewd.

I am trying to join two tables:

enter image description here

enter image description here

The objective is to join only records in Table A with the most recent date (A.last_run_date) for a given combination of dp_id AND pm_id, with records from Table B.

Desired Output

enter image description here

Here’s what I came up with, but this is unfortunately not working. The query is returning all the rows, instead of the most recent date (A.last_run_date) for a given combination of dp_id AND pm_id:

SELECT b.a_i_id,
       b.a_e_id,
       a.last_run_date,
       a.cm_id,
       a.dp_id
FROM   (SELECT dp_id,
               cm_id,
               Max (last_run_date) AS latest_run
        FROM   a
        GROUP  BY dp_id,
                  cm_id) A1
       INNER JOIN a
               ON a.dp_id = A1.dp_id
                  AND a.cm_id = A1.cm_id
       INNER JOIN b
               ON b.dp_id = a.dp_id
                  AND b.cm_id = a.dp_id
WHERE  Cast (a.last_run_date AS DATE) BETWEEN '2023-04-28' AND '2023-04-30' 

Can someone please advise what I’m doing wrong?

Thanks

2

Answers


  1. Try this. I just wrote the query out of mind and not tested. You may need to tweak it, but I hope you get the idea.

    WITH A1 AS (SELECT MAX(last_run_date) max_last_run_date, 
        cm_id, dp_id
        GROUP BY cm_id, dp_id
        )
    SELECT b.a_i_id, b.a_e_id, a1.last_run_date, a1.cm_id, a1.dp_id
        FROM B 
        INNER JOIN A1 
        ON B.dp_id = A1.dp_id AND B.cm_id = A.dp_id
    
    Login or Signup to reply.
  2. Your friend to get a most recent row is the analytic function ROW_NUMBER.

    Define the partition by with the columns that you will be joining and the order by on the column defining the order using DESC, so that the candidate (highest) row will get the number 1.

    Finally filter only row with rn = 1. The join is than a trivial case (leaving it out)

    select 
      last_run_date, dp_id, pm_id,
      row_number() over (partition by dp_id, pm_id order by last_run_date desc) rn
    from t;
    
    last_run_date|dp_id|pm_id|rn|
    -------------+-----+-----+--+
       2023-04-24|   24| 1000| 1|
       2023-04-23|   24| 1000| 2|
       2023-04-26|   24| 1005| 1|
       2023-04-25|   24| 1005| 2|
       2023-04-27|   25| 1023| 1|
       2023-04-28|   25| 1024| 1|
       
    with t2 as (
    select 
      last_run_date, dp_id, pm_id,
      row_number() over (partition by dp_id, pm_id order by last_run_date desc) rn
    from t
    )
    select 
      last_run_date, dp_id, pm_id
    from t2
    where rn = 1;
    
    last_run_date|dp_id|pm_id|
    -------------+-----+-----+
       2023-04-24|   24| 1000|
       2023-04-26|   24| 1005|
       2023-04-27|   25| 1023|
       2023-04-28|   25| 1024|
    

    Note you should always check in this aproach that the columns from partition by and order by are unique in the table (i.e. the last row is defined stable). Otherwise the query will work, but you can get in each execution a different result.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search