skip to Main Content

I’m on Postgres 15.1 and running the following query:

EXPLAIN (ANALYZE, VERBOSE)
WITH segmented_customer AS (SELECT *
                            FROM customer
                            WHERE integration_id = 162
                              AND (1 = 1 OR 1 = 1)),
     segmented_subscription
         AS (SELECT * FROM subscription WHERE integration_id = 162 AND 1 = 1),
     segmented_billing_cycle
         AS (SELECT * FROM billing_cycle WHERE integration_id = 162),
     segmented_product_version
         AS (SELECT * FROM product_version WHERE integration_id = 162 AND 1 = 1)
SELECT segmented_customer.id                         AS customer_id,
       segmented_customer.name                       AS customer_name,
       segmented_subscription.override_start_date    AS sub_override_start_ts,
       segmented_subscription.source_start_date      AS sub_source_start_ts,
       segmented_subscription.override_end_date      AS sub_override_end_ts,
       segmented_subscription.computed_end_date      AS sub_computed_end_ts,
       segmented_product_version.override_mrr_amount AS override_mrr_amount,
       segmented_product_version.total_mrr           AS total_mrr,
       segmented_product_version.version_start_date  AS version_start_ts,
       segmented_product_version.version_end_date    AS version_end_ts
FROM segmented_customer
         JOIN segmented_subscription
              ON segmented_customer.id = segmented_subscription.customer_id
         JOIN segmented_billing_cycle
              ON segmented_subscription.id = segmented_billing_cycle.subscription_id
         JOIN segmented_product_version
              ON segmented_billing_cycle.id = segmented_product_version.billing_cycle_id;

The query planner shows the following:

Gather  (cost=48644.92..114335.21 rows=1 width=105) (actual time=869.975..8133.738 rows=88061 loops=1)
"  Output: customer.id, customer.name, subscription.override_start_date, subscription.source_start_date, subscription.override_end_date, subscription.computed_end_date, product_version.override_mrr_amount, product_version.total_mrr, product_version.version_start_date, product_version.version_end_date"
  Workers Planned: 2
  Workers Launched: 2
  ->  Nested Loop  (cost=47644.92..113335.11 rows=1 width=105) (actual time=799.597..7986.501 rows=29354 loops=3)
"        Output: customer.id, customer.name, subscription.override_start_date, subscription.source_start_date, subscription.override_end_date, subscription.computed_end_date, product_version.override_mrr_amount, product_version.total_mrr, product_version.version_start_date, product_version.version_end_date"
        Worker 0:  actual time=755.136..7930.513 rows=28962 loops=1
          JIT:
            Functions: 29
"            Options: Inlining false, Optimization false, Expressions true, Deforming true"
"            Timing: Generation 5.048 ms, Inlining 0.000 ms, Optimization 0.922 ms, Emission 15.867 ms, Total 21.838 ms"
        Worker 1:  actual time=775.757..7927.262 rows=28968 loops=1
          JIT:
            Functions: 29
"            Options: Inlining false, Optimization false, Expressions true, Deforming true"
"            Timing: Generation 5.482 ms, Inlining 0.000 ms, Optimization 0.897 ms, Emission 16.110 ms, Total 22.489 ms"
        ->  Nested Loop  (cost=47644.49..113256.53 rows=9 width=81) (actual time=776.203..3896.817 rows=27877 loops=3)
"              Output: customer.id, customer.name, subscription.override_start_date, subscription.source_start_date, subscription.override_end_date, subscription.computed_end_date, billing_cycle.id"
              Worker 0:  actual time=730.393..3856.151 rows=27652 loops=1
              Worker 1:  actual time=730.376..3842.788 rows=27320 loops=1
              ->  Parallel Hash Join  (cost=47644.05..105823.47 rows=192 width=81) (actual time=752.267..1282.571 rows=9149 loops=3)
"                    Output: customer.id, customer.name, subscription.override_start_date, subscription.source_start_date, subscription.override_end_date, subscription.computed_end_date, subscription.id"
                    Inner Unique: true
                    Hash Cond: ((subscription.customer_id)::text = (customer.id)::text)
                    Worker 0:  actual time=694.538..1252.813 rows=8950 loops=1
                    Worker 1:  actual time=694.479..1230.973 rows=9024 loops=1
                    ->  Parallel Bitmap Heap Scan on public.subscription  (cost=322.92..58470.94 rows=11962 width=62) (actual time=8.967..527.330 rows=9149 loops=3)
"                          Output: subscription.id, subscription.source_id, subscription.source_start_date, subscription.source_end_date, subscription.status, subscription.attributes, subscription.created_at, subscription.updated_at, subscription.integration_id, subscription.organization_id, subscription.trial_end_date, subscription.trial_start_date, subscription.last_synced_at, subscription.computed_end_date, subscription.override_end_date, subscription.customer_id, subscription.override_start_date"
                          Recheck Cond: (subscription.integration_id = 162)
                          Heap Blocks: exact=4856
                          Worker 0:  actual time=8.969..556.056 rows=8950 loops=1
                          Worker 1:  actual time=8.954..533.827 rows=9024 loops=1
                          ->  Bitmap Index Scan on subscription_integration_id_e53b1bd5  (cost=0.00..315.75 rows=28709 width=0) (actual time=7.151..7.155 rows=27446 loops=1)
                                Index Cond: (subscription.integration_id = 162)
                                Worker 1:  actual time=7.151..7.155 rows=27446 loops=1
                    ->  Parallel Hash  (cost=47189.88..47189.88 rows=10500 width=41) (actual time=741.375..741.385 rows=8743 loops=3)
"                          Output: customer.id, customer.name"
                          Buckets: 32768  Batches: 1  Memory Usage: 2176kB
                          Worker 0:  actual time=683.803..683.810 rows=7868 loops=1
                          Worker 1:  actual time=683.794..683.808 rows=8225 loops=1
                          ->  Parallel Bitmap Heap Scan on public.customer  (cost=283.74..47189.88 rows=10500 width=41) (actual time=26.742..736.863 rows=8743 loops=3)
"                                Output: customer.id, customer.name"
                                Recheck Cond: (customer.integration_id = 162)
                                Heap Blocks: exact=6420
                                Worker 0:  actual time=17.125..679.507 rows=7868 loops=1
                                Worker 1:  actual time=17.345..679.678 rows=8225 loops=1
                                ->  Bitmap Index Scan on customer_integration_id_afcc30ba  (cost=0.00..277.44 rows=25201 width=0) (actual time=15.062..15.069 rows=26228 loops=1)
                                      Index Cond: (customer.integration_id = 162)
              ->  Index Scan using billing_cycle_subscription_id_b7f65c8b on public.billing_cycle  (cost=0.43..38.70 rows=1 width=16) (actual time=0.131..0.285 rows=3 loops=27446)
"                    Output: billing_cycle.id, billing_cycle.period_start_date, billing_cycle.period_end_date, billing_cycle.created_at, billing_cycle.updated_at, billing_cycle.subscription_id, billing_cycle.invoice_id, billing_cycle.integration_id, billing_cycle.organization_id, billing_cycle.last_synced_at, billing_cycle.is_trial_period"
                    Index Cond: (billing_cycle.subscription_id = subscription.id)
                    Filter: (billing_cycle.integration_id = 162)
                    Worker 0:  actual time=0.135..0.290 rows=3 loops=8950
                    Worker 1:  actual time=0.132..0.288 rows=3 loops=9024
        ->  Index Scan using product_version_billing_cycle_id_c8642f61 on public.product_version  (cost=0.43..8.72 rows=1 width=40) (actual time=0.140..0.146 rows=1 loops=83631)
"              Output: product_version.id, product_version.name, product_version.version_start_date, product_version.version_end_date, product_version.product_mrr, product_version.override_mrr_amount, product_version.quantity, product_version.source_product_id, product_version.source_price_id, product_version.is_metered, product_version.created_at, product_version.updated_at, product_version.billing_cycle_id, product_version.changed_date, product_version.invoice_id, product_version.integration_id, product_version.organization_id, product_version.last_synced_at, product_version.coupon_mrr, product_version.coupon_name, product_version.is_trial_version, product_version.tax_mrr, product_version.tax_name, product_version.total_mrr, product_version.attributes, product_version.query_grouping_id"
              Index Cond: (product_version.billing_cycle_id = billing_cycle.id)
              Filter: (product_version.integration_id = 162)
              Worker 0:  actual time=0.141..0.147 rows=1 loops=27652
              Worker 1:  actual time=0.142..0.149 rows=1 loops=27320
Planning Time: 9.807 ms
JIT:
  Functions: 87
"  Options: Inlining false, Optimization false, Expressions true, Deforming true"
"  Timing: Generation 22.063 ms, Inlining 0.000 ms, Optimization 4.291 ms, Emission 57.598 ms, Total 83.952 ms"
Execution Time: 8151.987 ms

Note, for example, that for billing_cycle every column is included in Output even though only id and subscription_id are used as join keys. Why is that, and does that introduce a performance issue?

I’ve tried Googling around and looked at the source documentation a bunch but can’t find what this Output means exactly.

2

Answers


  1. Re-forming tuples is a lot of work. Doing it more than necessary will slow things down. So it doesn’t re-form a tuple every time it might be possible to remove a column from it. A later node can remove the extra columns while it is also doing other re-forming work that it needs to do.

    Login or Signup to reply.
  2. Unless you do want to control whether those get materialized, you’re just typing more and nesting more. It technically makes little to no difference whether you keep your CTE’s or switch to subqueries, or add those as regular where conditions at the end, or as additional join conditions: demo.

    I think the Output lists behave like that due to how these scans work: seq and non-covering index scan still ends up grabbing the row from the heap. It’ll pass all that further up, even when unrequested, leaving the job of discarding to be handled by any other part of the query that actually needs it. It’ll even claim it’s passing huge values I intentionally TOASTed on storage external:

    QUERY PLAN
    Nested Loop (cost=0.00..58.14 rows=1 width=292) (actual time=0.018..0.021 rows=1 loops=1)
      Output: customer.id, customer.name, subscription.override_start_date, subscription.source_start_date, subscription.override_end_date, subscription.computed_end_date, product_version.override_mrr_amount, product_version.total_mrr, product_version.version_start_date, product_version.version_end_date
                  -> Seq Scan on public.billing_cycle (cost=0.00..1.02 rows=1 width=8) (actual time=0.006..0.007 rows=1 loops=1)
                        Output: billing_cycle.id, billing_cycle.subscription_id, billing_cycle.integration_id, billing_cycle.something_toasted, billing_cycle.organization_id

    But even if something gets passed, I doubt those really are actually de-TOASTed values.

    You can see the behaviour change with the index-only scan on covering index that happens to hold everything you wanted to use in the query, in the subset of columns it’s indexing:

    QUERY PLAN
    Nested Loop (cost=0.29..65.84 rows=1 width=292) (actual time=0.033..0.037 rows=1 loops=1)
      Output: segmented_customer.id, segmented_customer.name, segmented_subscription.override_start_date, segmented_subscription.source_start_date, segmented_subscription.override_end_date, segmented_subscription.computed_end_date, segmented_product_version.override_mrr_amount, segmented_product_version.total_mrr, segmented_product_version.version_start_date, segmented_product_version.version_end_date
            -> Index Only Scan using billing_cycle_id_subscription_id_integration_id_idx on public.billing_cycle segmented_billing_cycle (cost=0.29..4.31 rows=1 width=8) (actual time=0.009..0.010 rows=1 loops=1)
                  Output: segmented_billing_cycle.id, segmented_billing_cycle.subscription_id, segmented_billing_cycle.integration_id

    It recognizes that you’re not referencing anything beyond what’s in the index, so it doesn’t try to go fetch the rest of the columns.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search