skip to Main Content

I have a problem with doing a subquery in PostgreSQL to get a calculated column value, it reports:

[21000] ERROR: more than one row returned by a subquery used as an expression

Situation

I have two tables:

  1. accounts
  2. postal_code_to_state

Accounts table (subset of columns)

name postal_code state
Cust One 00020 NULL
Cust Two 63076 CD

Postal Code to State

pc_from pc_to state
10 30 AB
63000 63100 CD

The accounts table has rows where the state value may be unknown, but there is a postal code value. The postal code field is char (but that is incidental).

The postal_code_to_state table has rows with postcode (integer) from & to columns. That is the low/high numbers of the postal code range for a state.

There is no common field to do joins. To get the state from the postal_code_to_state table the char field is cast to INT and the between operator used, e.g.

SELECT state
FROM postal_code_to_state
WHERE CAST('00020' AS INT) BETWEEN pc_from AND pc_to

this works OK, there is also a unique index on pc_from and pc_to.

But I need to run a query selecting from the accounts table and populating the state column from the state column in the postal_code_to_state table using the postal_code from the accounts table to select the appropriate row.

I can’t figure out why PostgreSQL is complaining about the subquery returning multiple rows. This is the query I am currently using:

SELECT id,
       name,
       postal_code,
       state,
       (SELECT state
        FROM postal_code_to_state
        WHERE CAST(accounts.postal_code AS INT) BETWEEN pc_from AND pc_to) AS new_state
FROM accounts
WHERE postal_code IS NOT NULL ;

If I use LIMIT 1 in the subquery it is OK, and it returns the correct state value from postal_code_to_state, but would like to have it working without need to do that.

UPDATE 2022-10-22

@Adrian – thanks for query to find duplicates, I had to change your query a little, the != 'empty' to != FALSE.

When I run it on data I get this, groups of two rows (1 & 2, 3 & 4, etc.) shows the overlapping ranges.

state pc_from pc_to
CA 9010 9134
OR 9070 9170
UD 33010 33100
PN 33070 33170
TS 34010 34149
GO 34070 34170
CB 86010 86100
IS 86070 86170

So if I run…

SELECT pc_from,
       pc_to,
       state
FROM postal_code_to_state
WHERE int4range(pc_from, pc_to) @> 9070;

I get…

pc_from pc_to state
9010 9134 CA
9070 9170 OR

So, from the PostgreSQL side, the problem is clear – obviously it is the data. On the point of the data, what is shown on a site that has Italian ZIP code information is interesting:

https://zip-codes.nonsolocap.it/cap?k=12071&b=&c=

This was one of the dupes I had already removed.

The exact same ZIP code is used in two completely different provinces (states) – go figure! Given that the ZIP code is meant to resolve down to the street level, I can’t see how one code can be valid for two localities.

2

Answers


  1. The combined unique index would not protect you against overlapping postal codes, only duplicates. First, I’d write the query like this

    SELECT id, name, postcode, coalesce(accounts.state, postal_code_to_state.state) state
    FROM accounts
    LEFT JOIN postal_code_to_state ON accounts.postal_code::Integer BETWEEN pc_from AND pc_to
    WHERE accounts.state IS NOT NULL OR postal_code_to_state.state IS NOT NULL;
    

    You could modify it to tell you which are overlapping

    SELECT id, coalesce(accounts.state, postal_code_to_state.state) state
    FROM accounts
    LEFT JOIN postal_code_to_state ON accounts.postal_code::Integer BETWEEN pc_from AND pc_to
    WHERE accounts.state IS NOT NULL OR postal_code_to_state.state IS NOT NULL
    GROUP BY id,state
    HAVING count(id) > 1;
    

    I haven’t tested any of this.

    Login or Signup to reply.
  2. Try this query to find duplicates:

    select 
        a_tbl.state, a_tbl.pc_from, a_tbl.pc_to 
    from 
        postal_code_to_state as a_tbl, 
       (select *  from postal_code_to_state) as b_tbl 
    where 
        a_tbl.state != b_tbl.state 
    and 
       int4range(a_tbl.pc_from, a_tbl.pc_to, '[]') && int4range(b_tbl.pc_from, b_tbl.pc_to, '[]') != 'empty';
    

    If there are duplicates, after clearing them then you can do:

    alter table 
        postal_code_to_state 
    add 
       constraint exclude_test EXCLUDE USING GIST  (int4range(pc_from, pc_to, '[]') WITH &&);
    

    This will set up an exclusion constraint to prevent overlapping ranges.

    So:

    insert into postal_code_to_state values (10, 30, 'AB'), (6300, 63100, 'CD');
    insert into postal_code_to_state values (25, 40, 'SK');
    ERROR:  conflicting key value violates exclusion constraint "exclude_test"
    
    insert into postal_code_to_state values (31, 40, 'SK');
    INSERT 0 1
    
    select * from postal_code_to_state ;
     pc_from | pc_to | state 
    ---------+-------+-------
          10 |    30 | AB
       63000 | 63100 | CD
          31 |    40 | SK
    
    
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search