"Filtering" huge MariaDB/Mysql table based on different table

niedem
November 29, 2022
150 views
0 votes
2 Answers

Struggling with a large dataset in my mariaDB database. I have two tables, where table A contains 57 million rows and table B contains around 500. Table B is a subset of ids related to a column in table A. I want to delete all rows from A which do not have a corresponding ID in table B.

Example table A:

classification_id	Name
20	Mercedes
30	Kawasaki
80	Leitz
70	HP

Example table B:

classification_id	Type
20	car
30	bike
40	bus
50	boat

So in this example the last two rows from table A would be deleted (or a mirror table would be made containing only the first two rows, thats also fine).

I tried to do the second one using an inner join but this query took a few minutes before giving an out of memory exception.

Any suggestions on how to tackle this?

Tags: mariadb mysql

Answers

- DivyaPrakash
- November 29, 2022 at 5:37 pm
- 0 votes
0
try this:
```
delete from "table A" where classification_id not in (select classification_id from "table B");
```
Login or Signup to reply.

- SergeyKudriavtsev
- November 30, 2022 at 12:11 pm
- 0 votes
0
Since you say that the filter table contains a relatively small number of rows, your best bet would be creating a separate table that contains the same columns as the original table A and the rows that match your criteria, then replace the original table and drop it. Also, with this number of IDs you probably want to use WHERE IN () instead of joins – as long as the field you’re using there is indexed, it will usually be way faster. Bringing it all together:
```
CREATE TABLE new_A AS
  SELECT A.* FROM A 
  WHERE classification_id IN (SELECT classification_id FROM B);
RENAME TABLE A TO old_A, new_A to A;
DROP TABLE old_A;
```
Things to be aware of:
- Backup your data! And test the queries thoroughly before running that DROP TABLE. You don’t want to lose 57M rows of data because of a random answer at StackOverflow.
- If A has any indexes or foreign keys, these won’t be copied over – so you’ll have to recreate them all manually. I’d recommend running SHOW CREATE TABLE A first and making note on its structure. Alternatively, you may consider creating the table new_A explicitly using the output of SHOW CREATE TABLE A as a template and then performing INSERT INTO new_A SELECT ... instead of CREATE TABLE new_A AS SELECT ... with the same query after this.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.