I have a problem with doing a subquery in PostgreSQL to get a calculated column value, it reports:
[21000] ERROR: more than one row returned by a subquery used as an expression
Situation
I have two tables:
- accounts
- postal_code_to_state
Accounts table (subset of columns)
name | postal_code | state |
---|---|---|
Cust One | 00020 | NULL |
Cust Two | 63076 | CD |
Postal Code to State
pc_from | pc_to | state |
---|---|---|
10 | 30 | AB |
63000 | 63100 | CD |
The accounts
table has rows where the state value may be unknown, but there is a postal code value. The postal code field is char (but that is incidental).
The postal_code_to_state
table has rows with postcode (integer) from & to columns. That is the low/high numbers of the postal code range for a state.
There is no common field to do joins. To get the state from the postal_code_to_state
table the char field is cast to INT and the between
operator used, e.g.
SELECT state
FROM postal_code_to_state
WHERE CAST('00020' AS INT) BETWEEN pc_from AND pc_to
this works OK, there is also a unique index on pc_from
and pc_to
.
But I need to run a query selecting from the accounts table and populating the state column from the state column in the postal_code_to_state table using the postal_code from the accounts table to select the appropriate row.
I can’t figure out why PostgreSQL is complaining about the subquery returning multiple rows. This is the query I am currently using:
SELECT id,
name,
postal_code,
state,
(SELECT state
FROM postal_code_to_state
WHERE CAST(accounts.postal_code AS INT) BETWEEN pc_from AND pc_to) AS new_state
FROM accounts
WHERE postal_code IS NOT NULL ;
If I use LIMIT 1
in the subquery it is OK, and it returns the correct state value from postal_code_to_state, but would like to have it working without need to do that.
UPDATE 2022-10-22
@Adrian – thanks for query to find duplicates, I had to change your query a little, the != 'empty'
to != FALSE
.
When I run it on data I get this, groups of two rows (1 & 2, 3 & 4, etc.) shows the overlapping ranges.
state | pc_from | pc_to |
---|---|---|
CA | 9010 | 9134 |
OR | 9070 | 9170 |
UD | 33010 | 33100 |
PN | 33070 | 33170 |
TS | 34010 | 34149 |
GO | 34070 | 34170 |
CB | 86010 | 86100 |
IS | 86070 | 86170 |
So if I run…
SELECT pc_from,
pc_to,
state
FROM postal_code_to_state
WHERE int4range(pc_from, pc_to) @> 9070;
I get…
pc_from | pc_to | state |
---|---|---|
9010 | 9134 | CA |
9070 | 9170 | OR |
So, from the PostgreSQL side, the problem is clear – obviously it is the data. On the point of the data, what is shown on a site that has Italian ZIP code information is interesting:
https://zip-codes.nonsolocap.it/cap?k=12071&b=&c=
This was one of the dupes I had already removed.
The exact same ZIP code is used in two completely different provinces (states) – go figure! Given that the ZIP code is meant to resolve down to the street level, I can’t see how one code can be valid for two localities.
2
Answers
The combined unique index would not protect you against overlapping postal codes, only duplicates. First, I’d write the query like this
You could modify it to tell you which are overlapping
I haven’t tested any of this.
Try this query to find duplicates:
If there are duplicates, after clearing them then you can do:
This will set up an exclusion constraint to prevent overlapping ranges.
So: