Postgresql - Unique index on 3 columns where NULL conflicts with all other values in one column

MaxMax
March 27, 2024
123 views
0 votes
2 Answers

I have a table like this:

CREATE TABLE "users" 
(
    id serial PRIMARY KEY,
    town text NOT NULL,
    street text not null,
    building text
);

I want to be able to store unique entries by 3 columns "town", "street" and "building". If the third row "building" is null, there should be impossible to store any new other rows with same "town" or "street" column, no matter what is in the column "building".

So this should work:

---- example 1:
INSERT INTO "users" ("town", "street", "building") VALUES ('t', 's', 8);
INSERT INTO "users" ("town", "street", "building") VALUES ('t', 's', 9);

---- example 2: 
INSERT INTO "users" ("town", "street", "building") VALUES ('t', 's', null);

But this should not:

---- example 1:
INSERT INTO "users" ("town", "street", "building") VALUES ('t', 's', 8);
INSERT INTO "users" ("town", "street", "building") VALUES ('t', 's', 9);
INSERT INTO "users" ("town", "street", "building") VALUES ('t', 's', null);

---- example 2:
INSERT INTO "users" ("town", "street", "building") VALUES ('t', 's', 8);
INSERT INTO "users" ("town", "street", "building") VALUES ('t', 's', 8);

I tried to use two partial indexes for this:

CREATE UNIQUE INDEX "two_cols" 
ON "users" ("town", "street") 
WHERE "building" IS NULL; 

CREATE UNIQUE INDEX "three_cols" 
ON "users" ("town", "street",  "building") 
WHERE "building" IS NOT NULL;

but the problem that you can only set filter for current index, so first index only checks inside their query, and this allows to store null value with other values, which is not what I need. Removing filter on first index disallow to store two rows with same two but different third column. Is there an option to solve this problem?

Answers

Your first INSERT with twi the same data, fails and because both are treated as one transaction, and a roll back happen, which erases both insert.

this you can see as the second insert works perfectly. the builng 8 is not proesent.

when you make every insert its own transaction, it will enter the row and the second is no problem as it is in its own transaction, which when it rolls back, has 7 already saved.

so in short use transactions

CREATE TABLE "users" (
  "id" SERIAL PRIMARY KEY,
  "town" varchar(255) NOT NULL,
  "street" varchar(255) not null,
  "building" varchar(255) 
);

CREATE TABLE

ALTER TABLE "users"
ADD CONSTRAINT users_uni UNIQUE NULLS NOT DISTINCT ("town", "street", "building");

ALTER TABLE

---- example 2:
INSERT INTO "users" ("town", "street", "building") VALUES ('t', 's', 8);
INSERT INTO "users" ("town", "street", "building") VALUES ('t', 's', 8);

INSERT 0 1

ERROR:  duplicate key value violates unique constraint "users_uni"
DETAIL:  Key (town, street, building)=(t, s, 8) already exists.

---- example 1:
INSERT INTO "users" ("town", "street", "building") VALUES ('t', 's', 8);
INSERT INTO "users" ("town", "street", "building") VALUES ('t', 's', 9);
INSERT INTO "users" ("town", "street", "building") VALUES ('t', 's', null);

INSERT 0 1

INSERT 0 1

INSERT 0 1

---- example 2:
bEGIN;
INSERT INTO "users" ("town", "street", "building") VALUES ('t', 's', 7);
Commit;
bEGIN;
INSERT INTO "users" ("town", "street", "building") VALUES ('t', 's', 7);
Commit;

BEGIN

INSERT 0 1

COMMIT

BEGIN

ERROR:  duplicate key value violates unique constraint "users_uni"
DETAIL:  Key (town, street, building)=(t, s, 7) already exists.

fiddle

- ErwinBrandstetter
- March 23, 2024 at 7:21 am
- 0 votes
0
You want NULL to conflict with all values (incl. NULL). But distinct notnull values shall not conflict with each other. Try as you might, you won’t cover that with a UNIQUE constraint (or index).

This does exactly what you ask for:
```
-- requires additional module btree_gist
CREATE EXTENSION btree_gist;

-- auxiliary function
CREATE OR REPLACE FUNCTION texthash_int8range(text)
  RETURNS int8range
  LANGUAGE sql IMMUTABLE PARALLEL SAFE
RETURN int8range(hashtextextended($1, 0), hashtextextended($1, 0), '[]');

CREATE TABLE users (
  id serial PRIMARY KEY
, town text NOT NULL
, street text NOT NULL
, building text
, CONSTRAINT address_uni UNIQUE NULLS NOT DISTINCT (town, street, building)
, CONSTRAINT address_with_null_building EXCLUDE USING gist(hash_record_extended((town, street), 0) WITH =, texthash_int8range(building) WITH &&)
);
```
fiddle

Constraints and indexes

A Unique constraint is based on a B-tree index using equality checks at its core. An exclusion constraint is based on other index types, as of pg 16, GiST or SP-GiST, and can use additional operators – in particular the "overlaps" operator &&. But that’s not meant for text values. To get there, hash the text to int8 and build an int8range from it, which allows &&.

I use the hash function hashtextextended(). You could use hastext() to build int4range instead. Smaller and a bit faster, but increased chances for a hash collision. Both are built-in Postgres functions – undocumented, but reliable. See:
- Why doesn’t my UNIQUE constraint trigger?
The exclusion constraint enforces your special flavor of "uniqueness" completely, the added unique constraint is logically redundant. I kept it anyway. Its underlying B-tree index typically helps performance of many operations on the table. I implemented with NULLS NOT DISTINCT since that is closer to your case. See:
- Create unique constraint with null columns
Might also just be a plain index with default NULLS DISTINCT. The optimal set of constraints and indexes depends on details of your setup.

Building int8range from text

I encapsulated that task into the (optional) auxiliary function texthash_int8range(text). First, text is hashed to a bigint (int8) with hashtextextended(). text NULL is mapped to int8 NULL, which happens to be what we need.

An int8range is built with it, containing only the one value. In range types, NULL means "unbounded", so the range for NULL input becomes the unbounded range, overlapping with all. See:
- Why does PostgreSQL consider NULL boundaries in range types to be distinct from infinite boundaries?
Now everything falls into place for your flavor of uniqueness. The probability for a hash collision is practically zero for moderately large cardinalities, but still possible. If that’s an issue, this solution is not for you.

Building a multicolumn exclusion constraint

Since queries will be supported by the additional (faster) B-tree index, the purpose of the exclusion constraints is mostly just to enforce your rules. So I optimized for write performance and storage footprint rather than its utility as versatile index and condensed the leading two columns into a single hash with hash_record_extended((town, street), 0). That makes more sense the longer typical values are. For very short text values it might not pay.

We need the equality operator for that, which is not typically useful for GiST indexes, as B-tree indexes are better at that. But for the special case of a multicolumn constraint, we need it anyway. Postgres provides the required operator class(es) with the additional module btree_gist. Install that first. See:
- Exclusion constraint on a bitstring column with bitwise AND operator
- Preventing adjacent/overlapping entries with EXCLUDE in PostgreSQL
To optimize, I use the minimal form of an SQL-standard function. (Can be a plain SQL function, too.) See:
- What does BEGIN ATOMIC mean in a PostgreSQL SQL function / procedure?
But it must be IMMUTABLE. And should be PARALLEL SAFE.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – Unique index on 3 columns where NULL conflicts with all other values in one column

Answers

Constraints and indexes

Building `int8range` from `text`

Building a multicolumn exclusion constraint

Postgresql – Unique index on 3 columns where NULL conflicts with all other values in one column

Answers

Constraints and indexes

Building int8range from text

Building a multicolumn exclusion constraint

Building `int8range` from `text`