I wrote this query and i got "missing FROM clause
WITH t1 AS (
SELECT
a.name account_name,
SUM(o.standard_qty) total_std,
SUM(o.total) total
FROM
accounts a
JOIN orders o ON o.account_id = a.id
GROUP BY
1
ORDER BY
2 DESC
LIMIT
1
),
t2 AS (
SELECT
a.name
FROM
orders o
JOIN accounts a ON a.id = o.account_id
GROUP BY
1
HAVING
SUM(o.total) > t1.total
)
SELECT
COUNT(*)
FROM
t2;
but when I replace ‘t1.total’ with (SELECT total FROM t1)
it works, I need to understand why and what is the difference.
2
Answers
The difference between these two queries lies in the subquery part. In the original query, you’re using the MAX(total) function within the subquery, which calculates the maximum value of the total column across all rows in the table t1. This subquery returns a single scalar value – the maximum value of the total column.
In the modified query, you’re using (SELECT total FROM t1) within the subquery, which selects the total column from the t1 table. This subquery returns a set of values – all the individual values in the total column of the t1 table.
So, why does the modified query work while the original one doesn’t?
Scalar Value vs. Set of Values:
In the original query, when you use (SELECT MAX(total) FROM t1) in the subquery, it returns a single scalar value which represents the maximum value in the entire total column.
In the modified query, when you use (SELECT total FROM t1) in the subquery, it returns a set of values, which includes all individual values in the total column.
Comparison with a Single Value:
In the original query, the comparison is between t1.total (a single value from the outer query) and the maximum value of the total column (a single value from the subquery). This might lead to unexpected results if there are multiple rows with values equal to the maximum.
In the modified query, the comparison is between t1.total (a single value from the outer query) and a set of values from the subquery. SQL’s behavior in this case depends on the database system you’re using. Some databases might implicitly apply the comparison to each element in the set, which can lead to more predictable results.
So, if you’re using a database system that handles comparisons between single values and sets of values in a reasonable way, the modified query might work better. However, it’s important to understand the behavior of the specific database you’re using to ensure your query works as expected in all scenarios.
In t2, there is nothing which refers back to the t1 CTE. This is why using
t1.total
fails. When you use(SELECT total FROM t1)
instead this is a self-contained subquery with t1 referred to in the FROM clause.If you want to select data from t1 in the t2 CTE, you will need to join t1 to your query. For example: