I have 2 tables, 1 with ~1.5 million records and other with ~200.000, and i need to delete duplicates in the big table. I have one query and i think its correct, however can´t finish process because needs very much time. My question its about how long time needs?
My query:
DELETE jogar_totoloto2
FROM jogar_totoloto2, tbl_todos b
Where jogar_totoloto2.n1 = b.n1
and jogar_totoloto2.n2 = b.n2
and jogar_totoloto2.n3 = b.n3
and jogar_totoloto2.n4 = b.n4
and jogar_totoloto2.n5 = b.n5;
I have nice laptop with 16gb ram and amd ryzen 7 4800H processor.
With this pc should be faster or needs attention in config file, my.ini?
I’m waiting more than 30 min process finish and its not finished yet… what’s wrong???
3
Answers
INSERT is faster than DELETE. Will it be OK for you to keep the ‘needed’ data in a new table, delete the original table and rename the new table to original? (or TRUNCATE original and load data from new table)
You might need an index on some of the columns (possibly all, but if it’s a Lotto extraction with 5 numbers, it’s not useful to do a cardinality check – all columns will be equivalent).
For example
But to be sure, you ought to post the result of your query plan:
Perhaps, you might be better served by creating a new table and saving there the rows that do not match:
Then you can just delete the jogar_totoloto2 table and rename uniques as jogar_totoloto2, after verifying the contents are correct.
To me it’s somewhat strange to see the delete on a query. After all you just want to delete rows from one table where a row with the same values exists in another. That’s
WHERE
andIN
orEXISTS
for me:I don’t know whether this makes any difference in execution speed, though. MySQL will probably come up with thee same execution plan.
Now, how many of the 200,000 tbl_todos rows exist in jogar_totoloto2? All 200,000? That would be a lot and the query would run fastest, if there were no indexes on the table that must be maintained in the process. It can even pay to disable or remove indexes and drop triggers and enable / install them again after running the delete statement. The table jogar_totoloto2 should be run through sequentially row by row. An index on tbl_todos would help here to look up the rows there:
If, on the other hand, only few tbl_todos rows exist in jogar_totoloto2, then you’ll delete a rather small part of the table and you could benefit from an index that helps you find these rows quickly. The appropriate index would be:
The order of the columns in the indexes may matter. It is generally recommended to put the most selective column first. I don’t know whether this is important in MariaDB, though. Most selective means, if there are ten thousand different n4 in the table, but only two different n1, then put n4 before n1 in the index. So, the optimal index could look like this for instance: