skip to Main Content

I have 2 tables, 1 with ~1.5 million records and other with ~200.000, and i need to delete duplicates in the big table. I have one query and i think its correct, however can´t finish process because needs very much time. My question its about how long time needs?

My query:

DELETE jogar_totoloto2
FROM jogar_totoloto2, tbl_todos b
Where jogar_totoloto2.n1 = b.n1
and jogar_totoloto2.n2 = b.n2
and jogar_totoloto2.n3 = b.n3
and jogar_totoloto2.n4 = b.n4
and jogar_totoloto2.n5 = b.n5;

I have nice laptop with 16gb ram and amd ryzen 7 4800H processor.

With this pc should be faster or needs attention in config file, my.ini?

I’m waiting more than 30 min process finish and its not finished yet… what’s wrong???

3

Answers


  1. INSERT is faster than DELETE. Will it be OK for you to keep the ‘needed’ data in a new table, delete the original table and rename the new table to original? (or TRUNCATE original and load data from new table)

    Login or Signup to reply.
  2. You might need an index on some of the columns (possibly all, but if it’s a Lotto extraction with 5 numbers, it’s not useful to do a cardinality check – all columns will be equivalent).

    For example

    CREATE INDEX jt_ndx ON jogar_totoloto2(n1);
    CREATE INDEX tt_ndx ON tbl_todos(n1);
    

    But to be sure, you ought to post the result of your query plan:

    EXPLAIN DELETE ... /* the rest of your DELETE query */
    

    Perhaps, you might be better served by creating a new table and saving there the rows that do not match:

    CREATE TABLE uniques LIKE jogar_totoloto2;
    
    INSERT INTO uniques SELECT a.*
        FROM jogar_totoloto2 AS a 
        LEFT JOIN tbl_todos AS b
        USING (n1, n2, n3, n4, n5)
        WHERE b.n1 IS NULL;
    

    Then you can just delete the jogar_totoloto2 table and rename uniques as jogar_totoloto2, after verifying the contents are correct.

    Login or Signup to reply.
  3. To me it’s somewhat strange to see the delete on a query. After all you just want to delete rows from one table where a row with the same values exists in another. That’s WHERE and IN or EXISTS for me:

    DELETE FROM jogar_totoloto2
    WHERE (n1, n2, n3, n4, n5) IN (SELECT n1, n2, n3, n4, n5 FROM tbl_todos);
    

    I don’t know whether this makes any difference in execution speed, though. MySQL will probably come up with thee same execution plan.

    Now, how many of the 200,000 tbl_todos rows exist in jogar_totoloto2? All 200,000? That would be a lot and the query would run fastest, if there were no indexes on the table that must be maintained in the process. It can even pay to disable or remove indexes and drop triggers and enable / install them again after running the delete statement. The table jogar_totoloto2 should be run through sequentially row by row. An index on tbl_todos would help here to look up the rows there:

    CREATE INDEX idx1 ON tbl_todos (n1, n2, n3, n4, n5);
    

    If, on the other hand, only few tbl_todos rows exist in jogar_totoloto2, then you’ll delete a rather small part of the table and you could benefit from an index that helps you find these rows quickly. The appropriate index would be:

    CREATE INDEX idx2 ON jogar_totoloto2 (n1, n2, n3, n4, n5);
    

    The order of the columns in the indexes may matter. It is generally recommended to put the most selective column first. I don’t know whether this is important in MariaDB, though. Most selective means, if there are ten thousand different n4 in the table, but only two different n1, then put n4 before n1 in the index. So, the optimal index could look like this for instance:

    CREATE INDEX idx ON tablename (n4, n5, n3, n1, n2);
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search