Struggling with a large dataset in my mariaDB database. I have two tables, where table A contains 57 million rows and table B contains around 500. Table B is a subset of ids related to a column in table A. I want to delete all rows from A which do not have a corresponding ID in table B.
Example table A:
classification_id | Name |
---|---|
20 | Mercedes |
30 | Kawasaki |
80 | Leitz |
70 | HP |
Example table B:
classification_id | Type |
---|---|
20 | car |
30 | bike |
40 | bus |
50 | boat |
So in this example the last two rows from table A would be deleted (or a mirror table would be made containing only the first two rows, thats also fine).
I tried to do the second one using an inner join but this query took a few minutes before giving an out of memory exception.
Any suggestions on how to tackle this?
2
Answers
try this:
Since you say that the filter table contains a relatively small number of rows, your best bet would be creating a separate table that contains the same columns as the original table
A
and the rows that match your criteria, then replace the original table and drop it. Also, with this number of IDs you probably want to useWHERE IN ()
instead of joins – as long as the field you’re using there is indexed, it will usually be way faster. Bringing it all together:Things to be aware of:
DROP TABLE
. You don’t want to lose 57M rows of data because of a random answer at StackOverflow.A
has any indexes or foreign keys, these won’t be copied over – so you’ll have to recreate them all manually. I’d recommend runningSHOW CREATE TABLE A
first and making note on its structure. Alternatively, you may consider creating the tablenew_A
explicitly using the output ofSHOW CREATE TABLE A
as a template and then performingINSERT INTO new_A SELECT ...
instead ofCREATE TABLE new_A AS SELECT ...
with the same query after this.