Situation:
I have Model have a relation 1-1, sample:
class User(models.Model):
user_namme = models.CharField(max_length=40)
type = models.CharField(max_length=255)
created_at = models.DatetimeField()
...
class Book(models.Model):
user = models.OneToOneField(User, on_delete=models.CASCADE)
And I have a around 200,000 records.
- Languague: Python
- Framework: Django
- Database: Postgres
Question:
- How can I delete 200,000 records above with minimal cost?
Solution I have tried:
user_ids = Users.objects.filter(type='sample', created_date__gte='2022-11-15 08:00', created_date__lt="2022-11-15 08:30").values_list('id',flat=True)[:200000] # Fetch 200,000 user ids.
for i, _ in enumerate(user_ids[:: 1000]):
with transaction.atomic():
batch_start = i * self.batch_size
batch_end = batch_start + self.batch_size
_, deleted = Users.objects.filter(id__in=user_ids[batch_start,batch_end]
With this solution, my server use arround:
- 600MB CPU
- 300MB RAM
- Take more 15 minutes to finish workload.
I wonder do anyone have a better solution?
3
Answers
Thanks, everyone. I have tried the solution with RawQuery
This one can remove 600000 records and take around 10 minutes. And the server used around:
By first principles, nothing beats raw(Django query) SQL in terms of speed because it operates closest to the database!
cursor.execute(“DELETE FROM DB WHERE Column = %s”)
Or else you can do it by:
Variable = Model.objects.filter(variable=variable)
If you are using straight sql why not do a join on the user table with the date criteria to delete the books and then delete all the users using the created_date criteria? Let the database do all the work!
Even without writing the join
would be better than what you have.