Postgresql - Subquery returning field from second table with between operator using value from first table

justasojourner
October 22, 2022
210 views
0 votes
2 Answers

I have a problem with doing a subquery in PostgreSQL to get a calculated column value, it reports:

[21000] ERROR: more than one row returned by a subquery used as an expression

Situation

I have two tables:

accounts
postal_code_to_state

Accounts table (subset of columns)

name	postal_code	state
Cust One	00020	NULL
Cust Two	63076	CD

Postal Code to State

pc_from	pc_to	state
10	30	AB
63000	63100	CD

The accounts table has rows where the state value may be unknown, but there is a postal code value. The postal code field is char (but that is incidental).

The postal_code_to_state table has rows with postcode (integer) from & to columns. That is the low/high numbers of the postal code range for a state.

There is no common field to do joins. To get the state from the postal_code_to_state table the char field is cast to INT and the between operator used, e.g.

SELECT state
FROM postal_code_to_state
WHERE CAST('00020' AS INT) BETWEEN pc_from AND pc_to

this works OK, there is also a unique index on pc_from and pc_to.

But I need to run a query selecting from the accounts table and populating the state column from the state column in the postal_code_to_state table using the postal_code from the accounts table to select the appropriate row.

I can’t figure out why PostgreSQL is complaining about the subquery returning multiple rows. This is the query I am currently using:

SELECT id,
       name,
       postal_code,
       state,
       (SELECT state
        FROM postal_code_to_state
        WHERE CAST(accounts.postal_code AS INT) BETWEEN pc_from AND pc_to) AS new_state
FROM accounts
WHERE postal_code IS NOT NULL ;

If I use LIMIT 1 in the subquery it is OK, and it returns the correct state value from postal_code_to_state, but would like to have it working without need to do that.

UPDATE 2022-10-22

@Adrian – thanks for query to find duplicates, I had to change your query a little, the != 'empty' to != FALSE.

When I run it on data I get this, groups of two rows (1 & 2, 3 & 4, etc.) shows the overlapping ranges.

state	pc_from	pc_to
CA	9010	9134
OR	9070	9170
UD	33010	33100
PN	33070	33170
TS	34010	34149
GO	34070	34170
CB	86010	86100
IS	86070	86170

So if I run…

SELECT pc_from,
       pc_to,
       state
FROM postal_code_to_state
WHERE int4range(pc_from, pc_to) @> 9070;

I get…

pc_from	pc_to	state
9010	9134	CA
9070	9170	OR

So, from the PostgreSQL side, the problem is clear – obviously it is the data. On the point of the data, what is shown on a site that has Italian ZIP code information is interesting:

https://zip-codes.nonsolocap.it/cap?k=12071&b=&c=

This was one of the dupes I had already removed.

The exact same ZIP code is used in two completely different provinces (states) – go figure! Given that the ZIP code is meant to resolve down to the street level, I can’t see how one code can be valid for two localities.

Answers

The combined unique index would not protect you against overlapping postal codes, only duplicates. First, I’d write the query like this

SELECT id, name, postcode, coalesce(accounts.state, postal_code_to_state.state) state
FROM accounts
LEFT JOIN postal_code_to_state ON accounts.postal_code::Integer BETWEEN pc_from AND pc_to
WHERE accounts.state IS NOT NULL OR postal_code_to_state.state IS NOT NULL;

You could modify it to tell you which are overlapping

SELECT id, coalesce(accounts.state, postal_code_to_state.state) state
FROM accounts
LEFT JOIN postal_code_to_state ON accounts.postal_code::Integer BETWEEN pc_from AND pc_to
WHERE accounts.state IS NOT NULL OR postal_code_to_state.state IS NOT NULL
GROUP BY id,state
HAVING count(id) > 1;

I haven’t tested any of this.

Try this query to find duplicates:

select 
    a_tbl.state, a_tbl.pc_from, a_tbl.pc_to 
from 
    postal_code_to_state as a_tbl, 
   (select *  from postal_code_to_state) as b_tbl 
where 
    a_tbl.state != b_tbl.state 
and 
   int4range(a_tbl.pc_from, a_tbl.pc_to, '[]') && int4range(b_tbl.pc_from, b_tbl.pc_to, '[]') != 'empty';

If there are duplicates, after clearing them then you can do:

alter table 
    postal_code_to_state 
add 
   constraint exclude_test EXCLUDE USING GIST  (int4range(pc_from, pc_to, '[]') WITH &&);

This will set up an exclusion constraint to prevent overlapping ranges.

So:

insert into postal_code_to_state values (10, 30, 'AB'), (6300, 63100, 'CD');
insert into postal_code_to_state values (25, 40, 'SK');
ERROR:  conflicting key value violates exclusion constraint "exclude_test"

insert into postal_code_to_state values (31, 40, 'SK');
INSERT 0 1

select * from postal_code_to_state ;
 pc_from | pc_to | state 
---------+-------+-------
      10 |    30 | AB
   63000 | 63100 | CD
      31 |    40 | SK

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – Subquery returning field from second table with between operator using value from first table

Situation

UPDATE 2022-10-22

Answers