skip to Main Content

I have these SQL queries:

EXPLAIN ANALYZE
SELECT 
    * ,
    (
        SELECT COUNT(*) FROM orders 
        WHERE orders.customer_id = customers.id
    ) as orders_count
FROM customers
LIMIT 10

The execution plan for this query:

enter image description here

EXPLAIN ANALYZE
SELECT 
    * ,
    (
        SELECT COUNT(*) FROM orders 
        WHERE orders.customer_id = customers.id
    ) as orders_count
FROM customers
ORDER BY id
LIMIT 10

The execution plan for this query:

enter image description here

EXPLAIN ANALYZE
SELECT 
    * ,
    (
        SELECT COUNT(*) FROM orders 
        WHERE orders.customer_id = customers.id
    ) as orders_count
FROM customers
ORDER BY orders_count
LIMIT 10

The execution plan for this query:

enter image description here

How can you know from these queries and execution plans when the subquery "orders_count" is computed and assigned to each row in the table, and when it is restricted to only the top 10 rows?

Given a large number of records in the customers table, I aim to annotate as few rows as possible, rather than annotating all millions of records.

2

Answers


  1. In the explain plan, on line 5 (or 8 for the last one) it shows rows=10, so it looks like the SQL planner was able to use the limit in the subquery. We can’t see the number of loops on that line, though, so that’s not definitive.

    The third one, however, shows on line 7 that aggregating it is taking 10000 loops. Since the order requires the sub-query, there isn’t a way to limit it without calculating it for the whole table.

    Login or Signup to reply.
  2. In all three queries the scalar subquery:

    (
      SELECT COUNT(*) FROM orders 
      WHERE orders.customer_id = customers.id
    ) as orders_count
    

    is defined as correlated to the main query.

    In the first two cases, the optimizer does not need this value in the main query so it’s only computed for the 10 rows that are finally selected.

    In the last case this value is needed to sort the rows and then to limit them. This subquery will me computed for each row (all millions of them), each time the query is executed. You can see that by the "Limit" operator being applied only after the "Sort" operator. It’s its parent.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search