Given that I have these two following tables:
level_table
id | level_date | level |
---|---|---|
300 | 2021-05 | C |
300 | 2020-04 | C |
300 | 2019-08 | C |
300 | 2022-06 | C |
300 | 2019-02 | C |
300 | 2018-01 | C |
300 | 2023-05 | C |
300 | 2021-11 | C |
300 | 2021-09 | C |
300 | 2021-04 | C |
protocol_table
id | protc_date | protc |
---|---|---|
300 | 2021-03 | 1 |
300 | 2018-04 | 2 |
300 | 2021-08 | 3 |
300 | 2020-06 | 4 |
300 | 2019-01 | 5 |
300 | 2022-03 | 6 |
300 | 2019-05 | 7 |
300 | 2023-05 | 8 |
300 | 2019-06 | 9 |
300 | 2018-01 | 10 |
I’d like to create a third that would be the inner join of those two, on the condition that the level_date has to be <= protc_date, and the level_date has to be the max_value of that given ID, respecting the first condition.
So in this example, my desired output would be the following:
tbl_join
id | level_date | level | protc_date | protc |
---|---|---|---|---|
300 | 2020-04 | C | 2021-03 | 1 |
300 | 2018-01 | C | 2018-04 | 2 |
300 | 2021-05 | C | 2021-08 | 3 |
300 | 2020-04 | C | 2020-06 | 4 |
300 | 2018-01 | C | 2019-01 | 5 |
300 | 2021-11 | C | 2022-03 | 6 |
300 | 2019-02 | C | 2019-02 | 7 |
300 | 2023-05 | C | 2023-05 | 8 |
300 | 2019-02 | C | 2019-06 | 9 |
300 | 2018-01 | C | 2018-01 | 10 |
How can I create a query to perform this operation?
What i’ve tried so far is this query below, however it does not return me the desired output.
SELECT t1.*,
t2.protc_date,
t2.protc,
FROM level_table as t1
INNER JOIN protocol_table as t2
ON t1.id = t2.id
WHERE (t1.level_date <= t2.protc_date
AND t1.level_date = (
SELECT MAX(level_date)
FROM level_table as t1
WHERE t1.id = t2.id)
)
This query has returned me only one row, where it has the max(level_date) of my whole table, and not the max(level_date) for each protc and id.
id | level_date | level | protc_date | protc |
---|---|---|---|---|
300 | 2023-05 | C | 2023-05 | 8 |
2
Answers
The subquery is not "respecting the first condition" — it doesn’t restrict to
level_date <= protc_date
.DEMO
If you are running MySQL 8.0.13 or higher, I would recommend rewriting the query to use a lateral join; this simplifies the query and makes it more efficient, because it avoids opening the level table multiple times, as in your current code (once in the join, and another in a correlated subquery).
The subquery in the lateral join correlates the level table to the each row of the protocol table, sorts matching records by descending date and retains the top record only – which is then available in the outer query.