I have table towns which is main table. This table contains so many rows and it became so ‘dirty’ (someone inserted 5 milions rows) that I would like to get rid of unused towns.
There are 3 referent table that are using my town_id as reference to towns.
And I know there are many towns that are not used in this tables, and only if town_id
is not found in neither of these 3 tables I am considering it as inactive and I would like to remove that town (because it’s not used).
as you can see towns is used in this 2 different tables:
- employees
- offices
and for table * vendors there is vendor_id
in table towns since one vendor can have multiple towns.
so if vendor_id
in towns is null and town_id is not found in any of these 2 tables it is safe to remove it 🙂
I created a query which might work but it is taking tooooo much time to execute, and it looks something like this:
select count(*)
from towns
where vendor_id is null
and id not in (select town_id from banks)
and id not in (select town_id from employees)
So basically I said, if vendor_is
is null it means this town is definately not related to vendors and in the same time if same town is not in banks and employees, than it will be safe to remove it.. but query took too long, and never executed successfully...since towns has 5 milions rows and that is reason why it is so dirty..
In face I’m not able to execute given query since server terminated abnormally..
Here is full error message:
ERROR: server closed the connection unexpectedly This probably means
the server terminated abnormally before or while processing the
request.
Any kind of help would be awesome
Thanks!
3
Answers
You can try an JOIN on big tables it would be faster then two
IN
you could also try
UNION ALL
and live with the duplicates, as it is faster asUNION
Finally you can use a combined Index on id and vendor_id, to speed up the query
fiddle
You can join the tables using
LEFT JOIN
so that to identify thetown_id
for which there is no row in tablesbanks
andemployee
in theWHERE
clause :Before launching the DELETE, you can check the indexes on your tables.
Adding an index as follow can be usefull :
Last but not least you can add a
LIMIT
clause in thecte
so that to limit the number of rows you detele when you execute the DELETE and avoid the unexpected termination. As a consequence, you will have to relaunch the DELETE several times until there is no more row to delete.The trick is to first make a list of all the town_id’s you want to keep and then start removing those that are not there.
By looking in 2 tables you’re making life harder for the server so let’s just create 1 single list first.
The
idx_towns_town_id_where_vendor_null
is optional and I’m not sure if it will actaully lower the total time but IMHO it will help out with theDELETE
operation if only because the index should give the Query Optimizer a better view on what volumes to expect.