skip to Main Content

This is a strange test case from the Postgres regression test suite.

It starts with a table named tenk, which has 10,000 rows in it. The table has a column called unique1, both of which contain values from 0 to 9999, each row in the table having a unique value for that column for each column. (ie. one row has 369 for unique1 and 6521 for unique2, and no other row has those values, though there’s a row that has 369 for unique2 and a row that has 6521 for unique1.)

The test query is:

select
  (select max((select i.unique2 from tenk1 i where i.unique1 = o.unique1)))
from tenk1 o;

This returns a single row with a value of 9999. And that feels odd. Here’s how I parse the query in my own head:

Select [subquery] from tenk1 o

Subquery selects max(i.unique2) from tenki, where i.unique1 = o.unique1

This is basically a really weird way to write a self-join.

For each row in o, select the max i.unique2 value that corresponds to it.

There are 10,000 rows in o, each will have exactly 1 match in i.

Return 10,000 rows in the result set.

Where am I getting this wrong?

EDIT: The behavior is even stranger when you change the code around a little.

The SELECT seems to be equivalent to a join between tables i and o, but if you rewrite it as one:

select
  (select max((select i.unique2 from tenk1 i join tenk1 o on i.unique1 = o.unique1)))

it errors out:

ERROR: more than one row returned by a subquery used as an expression

But if you instead remove the max from the original query,

select
  (select (select i.unique2 from tenk1 i where i.unique1 = o.unique1))
from tenk1 o;

you get a result set of 10,000 elements, as expected. The database seems to have no trouble with a subquery returning more than one row here! Remove the table o from that inexplicably valid query,

select
  (select (select i.unique2 from tenk1 i))

and the "more than one row" error returns!

This is official PostgreSQL behavior, that the test suite requires the database to uphold. The rule set forth by the error message says you can’t have a subquery used as an expression that returns more than one row. What it it about this strange not-quite-join to a table in an external scope that circumvents this rule?

2

Answers


  1. select (select max(i.unique2) from tenk1 i where i.unique1 = o.unique1)
    from tenk1 o
    

    would do what you said, but the query doesn’t do that. It does an inner

    select i.unique2 from tenk1 i where i.unique1 = o.unique1
    

    (which is, as you say, a weird, even pointless, self-join, but this is a test case). That produces one row per row of the "outer" tenk1. Then it it uses max to get the single largest value of unique2 that came out of the query (which is also the largest value in the table).

    Login or Signup to reply.
  2. It is test for a regression.

    Per aggregrate.sql:

    — Test handling of sublinks within outer-level aggregates.
    — Per bug report from Daniel Grace.*

    Where bug report is aggregate over subselects fails:

    The following nonsensical query causes PostgreSQL to fail with ERROR: plan
    should not reference subplan’s variable. (This was stripped down from an
    ‘useful’ query that triggered the same bug). First encountered on 8.3.4,
    reproduced on 8.3.7

    Patch is Aggregrate patch:

    Fix the handling of sub-SELECTs appearing in the arguments of an outer-level
    aggregate function. By definition, such a sub-SELECT cannot reference any
    variables of query levels between itself and the aggregate’s semantic level
    (else the aggregate would’ve been assigned to that lower level instead).
    So the correct, most efficient implementation is to treat the sub-SELECT as
    being a sub-select of that outer query level, not the level the aggregate
    syntactically appears in. Not doing so also confuses the heck out of our
    parameter-passing logic, as illustrated in bug report from Daniel Grace.

    Fortunately, we were already copying the whole Aggref expression up to the
    outer query level, so all that’s needed is to delay SS_process_sublinks
    processing of the sub-SELECT until control returns to the outer level.

    This has been broken since we introduced spec-compliant treatment of
    outer aggregates in 7.4; so patch all the way back.

    With the aggregates.sql (r1.13.4.1 -> r1.13.4.2) patch being:

    + -- Test handling of sublinks within outer-level aggregates.
    + -- Per bug report from Daniel Grace.
    + select
    +   (select max((select i.unique2 from tenk1 i where i.unique1 = o.unique1)))
    + from tenk1 o;
    + 
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search