I have the following two tables:
CREATE TABLE products
(
id INT,
created_at DATE,
sold_at DATE
);
CREATE TABLE product_prices
(
id INT,
product_id INT,
price numeric,
created_at DATE
);
The data model logic works as follows:
- When a new product is put for sale, a record is inserted into
products
with the current date ascreated_at
. - At the same time, a record is inserted into
product_prices
with the same date increated_at
, an FK reference to the product and aprice
. - If a product is sold, the
sold_at
is set on theproducts
record. - If the product changes price throughout its sales period a new record is added to
product_prices
with thecreated_at
date theprice
was changed. This means, that if you wanna know what price a product has at a given date, then you need to check what the price was at that date by looking atproduct_prices
.
Now imagine that I have seed data something like this:
SELECT * FROM products;
id | created_at | sold_at |
---|---|---|
1 | 2022-01-25T00:00:00.000Z | 2022-02-18T00:00:00.000Z |
2 | 2022-01-26T00:00:00.000Z | |
3 | 2022-01-28T00:00:00.000Z | 2022-01-30T00:00:00.000Z |
4 | 2022-02-01T00:00:00.000Z | 2022-02-01T00:00:00.000Z |
5 | 2022-02-01T00:00:00.000Z | 2022-02-15T00:00:00.000Z |
6 | 2022-02-10T00:00:00.000Z | 2022-02-13T00:00:00.000Z |
7 | 2022-02-14T00:00:00.000Z | |
8 | 2022-02-19T00:00:00.000Z | |
9 | 2022-02-20T00:00:00.000Z | 2022-02-22T00:00:00.000Z |
10 | 2022-02-22T00:00:00.000Z |
and
SELECT * FROM product_prices;
id | product_id | price | created_at |
---|---|---|---|
1 | 1 | 100.0 | 2022-01-25T00:00:00.000Z |
2 | 1 | 95.0 | 2022-02-02T00:00:00.000Z |
3 | 1 | 85.0 | 2022-02-17T00:00:00.000Z |
4 | 2 | 89.0 | 2022-01-26T00:00:00.000Z |
5 | 2 | 85.0 | 2022-01-30T00:00:00.000Z |
6 | 3 | 91.0 | 2022-01-28T00:00:00.000Z |
7 | 4 | 50.0 | 2022-02-01T00:00:00.000Z |
8 | 5 | 100.0 | 2022-02-01T00:00:00.000Z |
9 | 5 | 99.0 | 2022-02-03T00:00:00.000Z |
10 | 6 | 79.0 | 2022-02-10T00:00:00.000Z |
11 | 6 | 75.0 | 2022-02-11T00:00:00.000Z |
12 | 6 | 71.0 | 2022-02-12T00:00:00.000Z |
13 | 7 | 120.0 | 2022-02-14T00:00:00.000Z |
14 | 7 | 110.0 | 2022-02-16T00:00:00.000Z |
15 | 8 | 89.0 | 2022-02-19T00:00:00.000Z |
16 | 9 | 30.0 | 2022-02-20T00:00:00.000Z |
17 | 9 | 29.0 | 2022-02-22T00:00:00.000Z |
18 | 10 | 100.0 | 2022-02-22T00:00:00.000Z |
I want to know what was the average price
and the number of products for sale
and the number of sold products
on a daily basis between 2022-01-23
and 2022-02-23
.
In pseudo SQL it would be something like:
SELECT
COUNT(products_for_sale_this_day),
COUNT(products_sold_this_day),
AVG(price_of_products_for_sale_on_this_day)
FROM
products ...
WHERE
date "is between 2022-01-23 and 2022-02-23"
GROUP BY
"dates in between"`
The result I would expect from the seed data would be:
Products for sale | Number of sold | Avg price | Date |
---|---|---|---|
0 | 0 | 0.0 | 2022-01-23 |
0 | 0 | 0.0 | 2022-01-24 |
1 | 0 | xx.xx | 2022-01-25 |
2 | 0 | xx.xx | 2022-01-26 |
2 | 0 | xx.xx | 2022-01-27 |
3 | 0 | xx.xx | 2022-01-28 |
3 | 0 | xx.xx | 2022-01-29 |
3 | 1 | 92.0 | 2022-01-30 |
2 | 0 | xx.xx | 2022-01-31 |
4 | 1 | xx.xx | 2022-02-01 |
3 | 0 | xx.xx | 2022-02-02 |
3 | 0 | xx.xx | 2022-02-03 |
3 | 0 | xx.xx | 2022-02-04 |
3 | 0 | xx.xx | 2022-02-05 |
3 | 0 | xx.xx | 2022-02-06 |
3 | 0 | xx.xx | 2022-02-07 |
3 | 0 | xx.xx | 2022-02-08 |
3 | 0 | xx.xx | 2022-02-09 |
4 | 0 | xx.xx | 2022-02-10 |
4 | 0 | xx.xx | 2022-02-11 |
4 | 0 | xx.xx | 2022-02-12 |
4 | 1 | xx.xx | 2022-02-13 |
4 | 0 | xx.xx | 2022-02-14 |
4 | 1 | xx.xx | 2022-02-15 |
3 | 0 | xx.xx | 2022-02-16 |
3 | 0 | xx.xx | 2022-02-17 |
3 | 1 | xx.xx | 2022-02-18 |
3 | 0 | xx.xx | 2022-02-19 |
4 | 0 | xx.xx | 2022-02-20 |
4 | 0 | xx.xx | 2022-02-21 |
5 | 1 | xx.xx | 2022-02-22 |
4 | 0 | xx.xx | 2022-02-23 |
NOTE: I added xx.xx
as I didn’t want to manually calculate the AVG
for every day in the example. On the 2022-01-30
the average price comes from the following products being for sale with the following prices:
- Product ID 1, price at
2022-01-30
:100.0
- Product ID 2, price at
2022-01-30
:85.0
- Product ID 3, price at
2022-01-30
:91.0
AVG: (100 + 85 + 91) / 3 = 92
4
Answers
The last table in Original post, last part should be like:
Main query:
Make previous query as view (test_2) for simplicity. Then do one more time left join.
If you join
products
withproduct_prices
you will found out that same product have multiple price entry. Since in products, id is unique, meaning every product only sold once, so to get the latest price (order by price_crearted_at desc) of a product inproduct_prices
, usedistinct on
and withorder by price_crearted_at desc
https://www.db-fiddle.com/f/hBBt3jcVnXXRV5pL1yn1yc/0
fiddle
I only join to products that are already listed and not yet sold to begin with. The day of the sale is included. Notably, products sold "today" are counted among "for sale" as well as among "sold", like your expected result demonstrates.
You show type
date
, buttimestamptz
literals. I went with dates like in your fiddle. Data types matter …Why did I use
generate_series()
this way?About the
LATERAL
join:About getting the latest applicable price:
About the aggregate
FILTER
clause: