Working in PostgreSQL, I intend to use the output of a CTE as a reference to join another CTE below it.
I’ve used this style countless times in SQL Server and Synapse, however I’m finding it difficult to replicate in PostgreSQL.
WITH RECURSIVE CTE_mst_rv_prod AS (
--CTE to answer the product with highest review
SELECT product_id, COUNT(review) "total_reviews",
RANK() OVER(ORDER BY COUNT(review) DESC) "Rank"
FROM public.reviews
GROUP BY product_id
ORDER BY RANK() OVER(ORDER BY COUNT(review) DESC)
LIMIT 1
),
CTE_pd_or_dt AS (
--CTE to answer the day product was ordered the most
SELECT P.product_id, O.order_date "MOST_ORDERED_DAY", COUNT(order_id) AS "COUNT_OF_ORDERS"
FROM CTE_mst_rv_prod P
JOIN public.orders O ON P.product_id = O.product_id
GROUP BY P.product_id, O.order_date
ORDER BY COUNT(order_id) DESC
LIMIT 1
),
CTE_or_dt AS(--here's the CTE that's unable to use column from the CTE above it
--CTE to answer if d day with highest order is a public holiday or not
SELECT D.calendar_dt AS "MOST_ORDERED_DAY",
CASE
WHEN D.day_of_the_week_num IN (1,5) AND working_day = false THEN TRUE
ELSE FALSE
END is_public_holiday
FROM CTE_pd_or_dt
INNER JOIN public.dim_dates D ON
CTE_pd_or_dt.MOST_ORDERED_DAY = D.calendar_dt
),
--CTE to answer the timeliness problem
CTE_shp_cond AS(
SELECT O.order_date, O.product_id, S.shipment_date, S.delivery_date,
O.order_id, CASE WHEN S.shipment_date - O.order_date >= 6 AND
delivery_date ISNULL THEN 'late' ELSE 'early'
END "shipment_condition"
FROM public.orders O
JOIN public.shipments_deliveries S ON S.order_id = O.order_id
JOIN CTE_mst_rv_prod RV ON O.product_id = RV.product_id
)
SELECT A.product_id, B.MOST_ORDERED_DAY, C.is_public_holiday, A.total_reviews,
(SELECT ROUND((COUNT(*) * 100.00)/ top_product.total_reviews, 4)
FROM public.reviews AS r2
JOIN CTE_mst_rv_prod "mrp" ON mrp.product_id = r2.product_id
WHERE r2.review = 1) AS "pct_one_star_review",
(SELECT ROUND((COUNT(*) * 100.00)/ top_product.total_reviews, 4)
FROM public.reviews AS r2
JOIN CTE_mst_rv_prod "mrp" ON mrp.product_id = r2.product_id
WHERE r2.review = 2) AS pct_two_star_review,
(SELECT ROUND((COUNT(*) * 100.00)/ top_product.total_reviews, 4)
FROM public.reviews AS r2
JOIN CTE_mst_rv_prod "mrp" ON mrp.product_id = r2.product_id
WHERE r2.review = 3) AS pct_three_star_review,
(SELECT ROUND((COUNT(*) * 100.00)/ top_product.total_reviews, 4)
FROM public.reviews AS r2
JOIN CTE_mst_rv_prod "mrp" ON mrp.product_id = r2.product_id
WHERE r2.review = 4) AS pct_four_star_review,
(SELECT ROUND((COUNT(*) * 100.00)/ top_product.total_reviews, 4)
FROM public.reviews AS r2
JOIN CTE_mst_rv_prod "mrp" ON mrp.product_id = r2.product_id
WHERE r2.review = 5) AS pct_five_star_review
FROM CTE_mst_rv_prod A
LEFT JOIN CTE_pd_or_dt B ON A.product_id = B.product_id
LEFT JOIN CTE_or_dt C ON C.order_date = B.MOST_ORDERED_DAY
Here’s my error message:
ERROR: column cte_pd_or_dt.most_ordered_day does not exist
LINE 28: CTE_pd_or_dt.MOST_ORDERED_DAY = D.calendar_dt
^SQL state: 42703
Character: 950
I’m working in PostgreSQL and this is a real stress.
2
Answers
I've been able to fix it, the issue is because in postgresql, CTE aliases are only available in that primary block, so when I do my join based on the initial column name the error disappear and the join work.
Notice that the error message uses a lowercase name even though you specified UPPERCASE.
Quoting the doc:
If you look close enough, that’s exactly what you missed in the CTE that fails, in the
INNER JOIN...ON
condition:Anytime you feel like using UPPERCASE in an alias/identifier, remember you’ll have to double quote it then and there, as well as anytime you want to reference it later. And anytime you or anyone else forgets that, or uses a tool that auto-quotes, or auto-quotes after applying pre-configured PascalCase, camelCase, snake_case, UPPERCASE or anything other than lowercase PostgreSQL defaults to, it won’t match the unquoted stuff.