skip to Main Content

I get database table which contain postal numbers and regions for my country. That table have all information but i need to change it for my purpose.

I need to eliminate all rows that have duplicate content in specific column.

Check screenshot to see result
enter image description here

I want to remove all duplicate rows which have postanski_broj (postal_number) the some. That number need to be unique. I try manualy to set that column to unique but i get duplicate entry when i try to execute statment.

  • ID is primary key with auto increment.
  • postanski_broj column is VARCHAR which represent postal_code
  • naselje column is VARCHAR which represent region

One region can have one postal_code

I try

ALTER  TABLE poste ADD UNIQUE INDEX idx_postanski_br (postanski_broj);

00:03:20 ALTER TABLE poste ADD UNIQUE INDEX idx_postanski_br
(postanski_broj) Error Code: 1062. Duplicate entry ‘11158’ for key
‘idx_postanski_br’ 0.118 sec

ALTER IGNORE TABLE poste ADD UNIQUE INDEX idx_postanski_br (postanski_broj);

00:04:17 ALTER IGNORE TABLE poste ADD UNIQUE INDEX idx_postanski_br
(postanski_broj) Error Code: 1064. You have an error in your SQL
syntax; check the manual that corresponds to your MySQL server version
for the right syntax to use near ‘IGNORE TABLE poste ADD UNIQUE INDEX
idx_postanski_br (postanski_broj)’ at line 1 0.00037 sec

Anyone have sugestion? Thanks

2

Answers


  1. This solution can take too much time for big tables. Best way of solving this is: Remove duplicate rows in MySQL

    You have to delete the rows before applying the unique constraint. Be careful applying this:

    DELETE p1 FROM poste p1
    INNER JOIN poste p2 
    WHERE 
        p1.id < p2.id AND 
        p1.postanski_broj = p2.postanski_broj;
    

    This should remove the duplicated ones and will keep only the ones with the higher id (id=168044 in your example).

    Login or Signup to reply.
  2. If you have other columns with different values than the ones you’ve shown there (except for id), deleting should be your last choice.

    I usually would duplicate the table first:

    CREATE TABLE poste_new LIKE poste;
    

    add unique index to the newly created poste_new table:

    ALTER  TABLE poste_new ADD UNIQUE INDEX idx_postanski_br (postanski_broj);
    

    insert the data from poste into poste_new with IGNORE option to skip duplicates based on the unique index:

    INSERT IGNORE INTO poste_new SELECT * FROM poste;
    

    rename the tables:

    RENAME TABLE poste TO poste_old;
    RENAME TABLE poste_new TO poste;
    

    The good thing about this is that you’ve minimized the risk of wrong delete and if you’re not satisfied with the new table, you still have the old table intact – effectively making it a backup.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search