skip to Main Content

I’m looking to build an application that tracks the price changes of certain Amazon products over time. I came up with the following tables:

product

| ID | Name        |
| 1  | iPhone case |
| 2  | Wallet      | 

product_price

| ID | product | price | date |
| 1  |    1    | 12.99 | 2023-03-04 |
| 1  |    2    | 10.99 | 2023-03-02 |
| 1  |    1    | 9.99  | 2023-03-01 |

But how would I now write an SQL query that retrieves all my products, and their latest price? Would this be possible at all given this table structure?

Another solution I could come up with, is to add another column to my product table:

| ID | name        | latest_price |
| 1  | iPhone case | 1            ]
| 2  | Wallet      | NULL         |

This way I can easily write a query to get products and their latest price. Downside is that whenever the price changes, I need to update this column as well.

Are there any best practices for this? Given that this doesn’t seem like a new problem.

2

Answers


  1. SELECT X.PRODUCT,X.PRICE,X.DATE 
     FROM
     ( 
       SELECT P.PRODUCT,P.PRICE,P.DATE,
       ROW_NUMBER()OVER(PARTITION BY P.PRODUCT ORDER BY P.DATE DESC,P.ID DESC)AS XCOL
       FROM product_price AS P
     ) AS X WHERE X.XCOL=1
    

    To get the latest price you can use query like above

    Login or Signup to reply.
  2. Using distinct on () is quite efficient in Postgres:

    select distinct on (product) pp.*
    from product_price pp
    order by product, date desc;
    

    This can be used in a join:

    select p.*, pp.price as latest_price
    from product p
      join (
        select distinct on (product) *
        from product_price pp
        order by product, date desc
      ) pp on pp.product = p.id
    

    Another option is to use a lateral join which sometimes is faster:

    select p.*, pp.price as latest_price
    from product p
      join lateral (
        select *
        from product_price pp
        where pp.product = p.id
        order by pp.date desc
        limit 1
      ) pp on true
    

    In any case an index on product_price (product, date desc) will speed up those queries.


    A different way to model this, is to use a daterange to define the time when the price is valid. This is a bit more complicated to update, as the "last" range must be closed before adding a new one. But it’s a bit more flexible to query (and might even be faster):

    create table product_price 
    (
      product int not null references product, 
      price numeric, 
      valid_during daterange,
      constraint unique_price_range 
         exclude using gist(product with =, valid_during with &&)
    );
    

    The exclusion constraint will prevent overlapping ranges for a single product.

    You can then query the "latest price" (=the price of "today") using:

    select *
    from product_price
    where valid_during @> current_date;
    

    But this also makes it quite easy to look up the price for any date you need (e.g. the date when a product was ordered):

    select *
    from product_price
    where valid_during @> date '2023-03-18';
    

    That query is much more expensive to run if you only store the start date.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search