Context: I have this table called product
and I am writing a script to clean up some old products, but have to follow rules, which can be deleted. One of the rules, is to only delete it if its not referenced by some other specific tables.
So currently I find all the products (using other simler rules) and then try to filter that IDs list on all referenced products like:
SELECT id FROM product
WHERE
id NOT IN (SELECT product_id FROM sale_line)
AND id NOT IN (SELECT product_id FROM purchase_line)
On smaller number of products, it works. But in real scenario, there is over a million rows. And this query just takes forever.
Is there any more efficient way to determine if table row is referenced on some other specific tables, before trying to delete it?
2
Answers
I have my idea.
Gathering all product_id from both tb_sale_line and tb_purchase_line by distinct and save to tb_all_product_id (This table has all product id is pk, and row_num as key (auto increment))
Write console/script app for fetch product id every 100,1000, or 1M rows by specified range of row_num and search in old table by exists condition.
Typically NOT EXISTS is faster than NOT IN: