I have a table companies
in which there are some duplicated rows I found with a simple GROUP BY name
.
This companies
table has a PRIMARY KEY id
which is referenced in several other tables, under the name company_id
, which I also found with a query on the INFORMATION_SCHEMA
.
Now my requirement: Merge the two companies
into one, meaning update
all the referenced company_id
to reference the first id of the two, then delete the second company
All the FOREIGN KEYS
are already declared with ON UPDATE CASCADE
clause but I’m not sure if there’s a smart way of accomplishing the final result without manually updating all the referenced FOREIGN KEYS
My intuition was to
- create a new temporary field
mark_to_delete
and set this = 1 on all the duplicatedcompanies
I want to delete - use
FOREIGN_KEY_CHECKS=0
to avoid errors while updating acompany.ID
to an existing one - Delete
companies
that aremark_to_delete=1
- Turn back on
FOREIGN_KEY_CHECKS=1
Clearly my biggest doubt is on how FOREIGN_KEY_CHECKS
works, I’ve tried reading the docs and searching for my specific case, but I can’t understand if the ON UPDATE CASCADE
will work or not.
Worst case scenario I think would be to list all the referenced company ID’s, run updates on the tables with a referenced ID setting it to the "healthy" company, and then delete the duplicated company.
2
Answers
You don’t need to change
FOREIGN_KEY_CHECKS
and you don’t need to think aboutON UPDATE CASCADE
.What you call the "worst case scenario" is actually easier.
Identify duplicate companies. Say companies with
id
1, 2, 14, 39 are duplicates of each other.Pick one to use as the canonical row for that given company. Let’s say it’s 1.
Update any references in child tables to the other duplicate company ids:
This does not require changing
FOREIGN_KEY_CHECKS
, because thecompany_id
references satisfy the constraint before and after the UPDATE.This does not incur
ON UPDATE CASCADE
. Cascading is for when you update the referenced primary key, but you aren’t doing that. You’re changing the foreign keys in child table(s).Repeat for any other tables that reference
companies
.Once the duplicate rows in
companies
have no references in child tables, you are free to delete them.I recommend double-checking all the other columns of
companies
before choosing the canonical row. Any of them could have useful data (e.g. current phone number), and you’d have to read them and merge manually.The duplicates in the sample are the rows in
main
table with the samemain_name
value. So the subquery selects minimalmain_id
for all duplicates, and all references are replaced with this value.Complete sample fiddle