skip to Main Content

It has been really hard for me to write a title for the question, but here is my situation: I have a bot that saves from a public API the prices of gas tagged by gas station id, time of price change, and type of fuel (it can be, for example, "Gas", "Diesel", "Natural Gas", "Special Gas", etc.).

The table gets updated every day: every time the manager of a station communicates a new price, a new record is added. I need to keep track of prices variations, this is the reason why I am not updating the record for that particular fuel type, but adding a new record.

The table looks like this (more or less, the real table is a little bit more complex):

id station_id updated_timestamp price type_of_fuel
1 1 2023-01-19 1.00 Gas
2 1 2023-01-19 1.20 Diesel
3 2 2023-01-19 1.05 Gas
4 1 2023-01-20 1.10 Gas
5 2 2023-01-20 1.10 Gas
6 1 2023-01-21 1.15 Gas

One of my use cases is to return the historical data, and that is no problem, but I also need to show only the latest available price for each station and each type of fuel. In other words, I expect to receive something like

id station_id updated_timestamp price type_of_fuel
2 1 2023-01-19 1.20 Diesel
5 2 2023-01-20 1.10 Gas
6 1 2023-01-21 1.15 Gas

Starting from a basic query

SELECT
  *
FROM
  TABLE

I understood that a JOIN with a subquery might be the solution, something like:

SELECT
  *
FROM
  TABLE
AS
  T
INNER JOIN (
  SELECT
    id,
    station_id,
    MAX(updated_timestamp)
  FROM
    TABLE
  GROUP BY
    station_id
) AS SUB ON T.id = SUB.id

But this is not working as expected. Any idea? I’d like to be able to write this query, and to understand why it works.

Thank you in advance.

2

Answers


  1. Since MySQL 8, you can use the ROW_NUMBER window function for this kind of task.

    WITH cte AS (
        SELECT *, 
               ROW_NUMBER() OVER(PARTITION BY station_id, type_of_fuel ORDER BY id DESC) AS rn
        FROM tab
    )
    SELECT id, station_id, updated_timestamp, price, type_of_fuel 
    FROM cte
    WHERE rn = 1
    

    Check the demo here.


    In MySQL 5.X you are forced to compute aggregation separately. Since you want the last price for each station_id and type of fuel, you need to look for the last id, in partition <station_id, type_of_fuel>. Then join back on matching ids.

    SELECT tab.*
    FROM tab
    INNER JOIN (SELECT MAX(id) AS id
                FROM tab
                GROUP BY station_id, type_of_fuel) cte
            ON tab.id = cte.id
    

    Check the demo here.

    Note: If the "ID" field gives you a correct ordering with respect to your data, it’s preferable to use it, as long as conditions on integers make join operations more efficient than conditions on dates. If that’s not the case, you are forced to use dates instead, changing:

    • MAX(id) AS id to MAX(updated_timestamp) AS updated_timestamp
    • ON tab.id = cte.id to ON tab.updated_timestamp = cte.updated_timestamp.
    Login or Signup to reply.
  2. Hope this works.

    with tsample as (
    select '1' as id,'1' as station_id,'2023-01-19' as updated_time,'1.00' as price,'Gas' as type union all
    select '2' as id,'1' as station_id,'2023-01-19' as updated_time,'1.20' as price,'Diesel' as type union all
    select '3' as id,'2' as station_id,'2023-01-19' as updated_time,'1.05' as price,'Gas' as type union all
    select '4' as id,'1' as station_id,'2023-01-20' as updated_time,'1.10' as price,'Gas' as type union all
    select '5' as id,'2' as station_id,'2023-01-20' as updated_time,'1.10' as price,'Gas' as type union all
    select '6' as id,'1' as station_id,'2023-01-21' as updated_time,'1.15' as price,'Gas' as type 
    )
    select station_id,type,max(price),max(updated_time),max(id) from tsample group by 1,2
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search