We (almost) always add a general id
column as a primary key while designing sql tables. If a table already contains a unique column, I wonder if we should designate that as the primary key and not use the id
column or still use it while just adding a unique
key index for the unique column. And what would be the implications of either of the approaches mentioned above?
We recently had a case where a table had a UUID designated as the primary key but the said column led to certain issues when mysqldump
was used to dump the database. The said table also has a non-null unique column of type VARCHAR
so to work around the aforementioned issue, the id
column was removed and instead, the unique key column was designated as the primary key.
This made me wonder if we should always do this or only in specific cases and if it would impact the operations on that table in any way.
Some of the questions that come to mind are:
- Would it affect the performance? If yes, what if the said table has a lesser number of records, eg., less than a hundred?
- The said unique column being a string-type column may contain anything. So, would that result in any issue while joining it with another table via a primary-foreign key relationship?
Any help in this regard is highly appreciated.
2
Answers
Here is a list of considerations for using an existing unique column as primary key instead of adding a special column solely for that purpose:
2 out of 3 tables I have written have a "natural"
PRIMARY KEY
. I avoid artificially adding anid
to such tables.Here are some common patterns. (
id
is AUTO_INCREMENT`)A uuid is very random, hence this may be optimal:
For many-to-mapping table between foo and bar, this saves both space and speed, while enforcing that you don’t accidentally insert duplicates. It is implemented (in InnoDB) as two 2-column BTrees:
Sometimes the artificial id is beneficial, but this can let you use the PK for "locality of reference":
As for your specific questions:
TEXT
andBLOB
cannot be indexed, and prefixing (eg,INDEX(ext(50))
does not really address your concerns.PARTITIONing
a table adds more cans-of-worms.INSERTing
a row must wait for all PRIMARY and UNIQUE keys to be checked for dups before completing. Other secondary keys are inserted in a ‘delayed’ technique (using the "Change Buffer"). That is, there is a slight benefit in minimizing the number of unique keys in a table.