I’m looking to build an application that tracks the price changes of certain Amazon products over time. I came up with the following tables:
product
| ID | Name |
| 1 | iPhone case |
| 2 | Wallet |
product_price
| ID | product | price | date |
| 1 | 1 | 12.99 | 2023-03-04 |
| 1 | 2 | 10.99 | 2023-03-02 |
| 1 | 1 | 9.99 | 2023-03-01 |
But how would I now write an SQL query that retrieves all my products, and their latest price? Would this be possible at all given this table structure?
Another solution I could come up with, is to add another column to my product
table:
| ID | name | latest_price |
| 1 | iPhone case | 1 ]
| 2 | Wallet | NULL |
This way I can easily write a query to get products and their latest price. Downside is that whenever the price changes, I need to update this column as well.
Are there any best practices for this? Given that this doesn’t seem like a new problem.
2
Answers
To get the latest price you can use query like above
Using
distinct on ()
is quite efficient in Postgres:This can be used in a join:
Another option is to use a lateral join which sometimes is faster:
In any case an index on
product_price (product, date desc)
will speed up those queries.A different way to model this, is to use a daterange to define the time when the price is valid. This is a bit more complicated to update, as the "last" range must be closed before adding a new one. But it’s a bit more flexible to query (and might even be faster):
The exclusion constraint will prevent overlapping ranges for a single product.
You can then query the "latest price" (=the price of "today") using:
But this also makes it quite easy to look up the price for any date you need (e.g. the date when a product was ordered):
That query is much more expensive to run if you only store the start date.