skip to Main Content

The execution order in SQL is the following: FROM –> WHERE –> GROUP BY –> HAVING –> SELECT –> ORDER BY –> LIMIT

In my query I am using a CASE WHEN statement and an aggregate function in the SELECT statement and then GROUPING it by the CASE WHEN column. But in the GROUP BY clause I dont’ repeat the CASE WHEN construction but just use the column alias "shipping_cost".

SELECT 
  CASE
    WHEN ship_country = 'USA' OR ship_country = 'Canada' THEN 0.0
    ELSE 10.0
  END AS shipping_cost,
  COUNT(*) AS order_count
FROM orders
GROUP BY shipping_cost;

According to what is mentioned above, since the GROUP BY statement is executed before the SELECT statement, how can the query run successfully and not return an error? I am working in PostgresSQL.

Note: I realize this question may have been asked many times but to be honest I haven’t been able to find a satisfying answer to it yet

2

Answers


  1. Use EXPLAIN ANALYZE VERBOSE to see what exactly PostgreSQL does with your query.

    The execution order you described is generally, more or less how it works, but ultimately the last word belongs to the planner. In your case, the planner sees it only makes sense to do the case before aggregation and it does exactly that: demo

    explain analyze verbose
    SELECT 
      CASE
        WHEN ship_country = 'USA' OR ship_country = 'Canada' THEN 0.0
        ELSE 10.0
      END AS shipping_cost,
      COUNT(*) AS order_count
    FROM orders
    GROUP BY shipping_cost;
    
    QUERY PLAN
    HashAggregate (cost=35.40..38.40 rows=200 width=40) (actual time=0.088..0.089 rows=2 loops=1)
      Output: (CASE WHEN ((ship_country = ‘USA’::text) OR (ship_country = ‘Canada’::text)) THEN 0.0 ELSE 10.0 END), count(*)
      Group Key: CASE WHEN ((orders.ship_country = ‘USA’::text) OR (orders.ship_country = ‘Canada’::text)) THEN 0.0 ELSE 10.0 END
      Batches: 1 Memory Usage: 40kB
      -> Seq Scan on public.orders (cost=0.00..29.05 rows=1270 width=32) (actual time=0.078..0.081 rows=5 loops=1)
            Output: CASE WHEN ((ship_country = ‘USA’::text) OR (ship_country = ‘Canada’::text)) THEN 0.0 ELSE 10.0 END
    Planning Time: 0.068 ms
    Execution Time: 0.127 ms
    Login or Signup to reply.
  2. The order of execution of a SELECT statement is the logical order. An SQL implementation is free to do it some other way as long as the result is the same.

    In this case, PostgreSQL extends the SQL standard in a well documented way:

    An expression used inside a grouping_element can be an input column name, or the name or ordinal number of an output column (SELECT list item), or an arbitrary expression formed from input-column values. In case of ambiguity, a GROUP BY name will be interpreted as an input-column name rather than an output column name.

    This is further explained in the section on SQL standard compatibility:

    In the SQL-92 standard, an ORDER BY clause can only use output column names or numbers, while a GROUP BY clause can only use expressions based on input column names. PostgreSQL extends each of these clauses to allow the other choice as well (but it uses the standard’s interpretation if there is ambiguity). PostgreSQL also allows both clauses to specify arbitrary expressions. Note that names appearing in an expression will always be taken as input-column names, not as output-column names.

    SQL:1999 and later use a slightly different definition which is not entirely upward compatible with SQL-92. In most cases, however, PostgreSQL will interpret an ORDER BY or GROUP BY expression the same way SQL:1999 does.

    If you don’t like the feature, don’t use it and stay standard compliant.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search