It has been really hard for me to write a title for the question, but here is my situation: I have a bot that saves from a public API the prices of gas tagged by gas station id, time of price change, and type of fuel (it can be, for example, "Gas", "Diesel", "Natural Gas", "Special Gas", etc.).
The table gets updated every day: every time the manager of a station communicates a new price, a new record is added. I need to keep track of prices variations, this is the reason why I am not updating the record for that particular fuel type, but adding a new record.
The table looks like this (more or less, the real table is a little bit more complex):
id | station_id | updated_timestamp | price | type_of_fuel |
---|---|---|---|---|
1 | 1 | 2023-01-19 | 1.00 | Gas |
2 | 1 | 2023-01-19 | 1.20 | Diesel |
3 | 2 | 2023-01-19 | 1.05 | Gas |
4 | 1 | 2023-01-20 | 1.10 | Gas |
5 | 2 | 2023-01-20 | 1.10 | Gas |
6 | 1 | 2023-01-21 | 1.15 | Gas |
One of my use cases is to return the historical data, and that is no problem, but I also need to show only the latest available price for each station and each type of fuel. In other words, I expect to receive something like
id | station_id | updated_timestamp | price | type_of_fuel |
---|---|---|---|---|
2 | 1 | 2023-01-19 | 1.20 | Diesel |
5 | 2 | 2023-01-20 | 1.10 | Gas |
6 | 1 | 2023-01-21 | 1.15 | Gas |
Starting from a basic query
SELECT
*
FROM
TABLE
I understood that a JOIN with a subquery might be the solution, something like:
SELECT
*
FROM
TABLE
AS
T
INNER JOIN (
SELECT
id,
station_id,
MAX(updated_timestamp)
FROM
TABLE
GROUP BY
station_id
) AS SUB ON T.id = SUB.id
But this is not working as expected. Any idea? I’d like to be able to write this query, and to understand why it works.
Thank you in advance.
2
Answers
Since MySQL 8, you can use the
ROW_NUMBER
window function for this kind of task.Check the demo here.
In MySQL 5.X you are forced to compute aggregation separately. Since you want the last price for each station_id and type of fuel, you need to look for the last id, in partition <station_id, type_of_fuel>. Then join back on matching ids.
Check the demo here.
Note: If the "ID" field gives you a correct ordering with respect to your data, it’s preferable to use it, as long as conditions on integers make join operations more efficient than conditions on dates. If that’s not the case, you are forced to use dates instead, changing:
MAX(id) AS id
toMAX(updated_timestamp) AS updated_timestamp
ON tab.id = cte.id
toON tab.updated_timestamp = cte.updated_timestamp
.Hope this works.