Postgresql - Postgres: Do not auto increment user "id" if "email" already exists?

BrettDoffing
March 16, 2024
75 views
0 votes
2 Answers

Having id as the primary key for a user when they sign up with an email, the id field is set to auto increment. It seems that when a user is trying to sign up with an email that already exists in the table, id will still increment, even though a new row is not created.

Is there a way I can prevent the id from incrementing if email is not unique?

Tags: postgresql

Answers

- Zegarek
- March 9, 2024 at 7:35 pm
- 0 votes
0
You can avoid some of the sequence skips due to rejected inserts if you replace the INSERT with a MERGE. It’s checking for matches before attempting the when not matched action, so it’ll skip to do nothing without attempting the insert entirely: Demo here
```
MERGE INTO users u
USING (values ('[email protected]')) t(email)
ON t.email = u.email
WHEN MATCHED THEN DO NOTHING
WHEN NOT MATCHED THEN
  INSERT (email)
  VALUES (t.email);
```
Nextval() just won’t happen. INSERT normally calls it and moves the sequence even if it’ll subsequently fail due to constraint violation or for any other reason. It’ll also call it regardless of ON CONFLICT actions.

A merge that fails or is rolled back after it ends up attempting to insert, can still give you gaps.

As established in the comments, you shouldn’t consider those gaps a problem. Nothing you’re supposed to use a primary key column for, should require that it’s continuous or even in any particular order. You also shouldn’t be concerned with stored row order unless you cluster or play column tetris.
1. If you think it’s a problem because you want something to indicate creation time, add a created_at timestamptz column with a default now().
2. If you want continuous ordinal numbers, set up a view that’ll give you select *,row_number()over(order by created_at), not unlike here.
3. If you plan to iterate over the table referencing consecutive id values, loop over a query or a cursor instead, if in PL/pgSQL. If in a recursive cte, you can give them a row_number() first, then iterate over that.
An id in a serial or identity column can’t be trusted to be continuous, or even ordered. If someone’s pet is named "3rd mouse", it usually means they have a well-fed cat, not 2 more mice. There’s also no guarantee the 3rd mouse isn’t older than the 2nd is (would’ve been, if it wasn’t for its untimely demise).

They’re normally sequential and continuous only as a side-effect: it’s just faster and easier to generate them like that, but the only goal is to keep them unique.

On top of the gaps you’re getting, you can also get out-of-order ids if there are multiple clients operating concurrently (and caching):
1. Client A grabs 1 and 2.
2. Client B grabs 3 and 4, fails and issues a rollback.
3. Client C grabs 5 and 6 and commits their insert before A.
4. Client A commits their 1 and 2.
5. Your id column now shows 5,6,1,2. The oldest row is 5, most recent is 2.
You can tighten (not remove) the gaps if you alter sequence pk_seq cache 1; but the order can still break:
1. A gets 1 and doesn’t commit yet.
2. B gets 2 and fails.
3. C gets 3 and commits immediately.
4. A commits 1.
5. Your id column now shows 3,1.
That, and any update, delete or default-id-overriding insert can also violate this fragile, accidental order.
Login or Signup to reply.

- Atmo
- March 16, 2024 at 3:13 am
- 0 votes
0
Just turning my comments into an answer to make it more readily accessible to future readers.

In the case of a user table and more generally in the case of any data a hacker would be interested in getting access to, use a randomly generated number; a sequence is bad security practice.
The predictable (although not perfectly because/thanks to the gaps) nature of sequential ids offers a vector for attackers to get access to unauthorized data.
Just Google uuid vs sequential id to get a sense of why this is important. You should be able to find reports about incidents that occurred partly because a critical identifier was a sequential id.

UUIDs are slower than sequence values and take more space too (they are 128-bit values).
However, such downside is easily offset by the security offered by random values (obviously, do not consider it to be the sole security measure you have to take).
Also, they have virtually no risk of ever colliding inside 1 table.

On a related note, UUID v7 (a time-ordered version of UUIDs, see here or the long developer thread for some explanation) are expected to be delivered to Postgres 17. I am not completely clear about that yet (and will update my answer if I find some time looking at it) but basically, it would allow you to order values chronologically while keeping them random too.

You most likely do not need/want the ability to order your users by when they registered though but I suppose there are situations where it is useful.

For columns where sequential ids are suitable, you should accept there will be gaps in the sequence.

As was commented, a sequence-based primary key column should not have any semantic meaning attached to it; if it indeed has no meaning, it should not be an issue for you to have gaps (just make sure you choose a big enough integer type to store all the values required in the future).
Note that in some DBMS (e.g. Oracle), there is even no ON UPDATE clause you can attach to a foreign key. This way, you cannot mess up your schema by causing the update of multiple records in multiple tables by updating just 1 record in a master table (that is, with a ON UPDATE CASCADE clause).

Gaps occur for a number of reasons:
- When a record is deleted.
- When sequence values are generated in a transaction that is ultimately rolled-back.
- When the database is stopped/restarted.
Asking a DB to remember what values are discarded or skipped would defeat some of the main selling points of sequences: performance and parallelism.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – Postgres: Do not auto increment user "id" if "email" already exists?

Answers