When a queryset is sorted by a field in a related model performance decreases drastically. I use mysql.
For example, I have two models:
class Event(models.Model):
idEvent = models.BigAutoField(primary_key=True)
created_at = models.DateTimeField(db_index=True, verbose_name=_('date'))
processed_at = models.DateTimeField(auto_now_add=True, verbose_name=_('processed'))
data = models.TextField()
class Meta:
ordering = ["-created_at"]
# [1154519 rows]
class AccessHistory(models.Model):
event = models.ForeignKey(Event, on_delete=models.CASCADE, blank=True, null=True)
result = models.TextField(verbose_name=_('result'))
# [1130603 rows]
If I do AccessHistory.objects.all().select_related('event').order_by('-event__created_at')
the query delays over 5s, if I swap select_related with prefetch_related I get the same delay. When I do the query without ordering it responds in the expected time (<1s)
-- AccessHistory.objects.all().select_related('event').order_by('-event__created_at')
SELECT `history_accesshistory`.`id`,
`history_accesshistory`.`event_id`,
`history_accesshistory`.`result`,
`history_event`.`idEvent`,
`history_event`.`created_at`,
`history_event`.`processed_at`,
`history_event`.`data`
FROM `history_accesshistory` LEFT OUTER JOIN `history_event` ON (`history_accesshistory`.`event_id` = `history_event`.`idEvent`)
ORDER BY `history_event`.`created_at` DESC
-- AccessHistory.objects.all().prefetch_related('event').order_by('-event__created_at')
SELECT `history_accesshistory`.`id`,
`history_accesshistory`.`event_id`,
`history_accesshistory`.`result`
FROM `history_accesshistory` LEFT OUTER JOIN `history_event` ON (`history_accesshistory`.`event_id` = `history_event`.`idEvent`)
ORDER BY `history_event`.`created_at` DESC
I tried with select_related, prefetch_related, indexing created_at field in Event model and setting a default order in Event model. Nothing of this improves the response time.
How I can optimize this without moving/copying created_at field to AccessHistory model?
EDIT 1:
I use this queryset in django-tables2 view. This is why there is no limit.
EDIT 2:
@Rick James thanks for the answer. Now I understand the problem but I don’t found how to solve it. The relation is not one to one, every AccessHistory has an Event but there are Events without AccessHistory. With INNER JOIN or JOIN it works as expected but I don’t know how to do the query with django ORM.
I tried with AccessHistory.objects.select_related('event')
but it uses a LEFT OUTER JOIN:
SELECT `history_accesshistory`.`id`,
`history_accesshistory`.`event_id`,
`history_accesshistory`.`result`,
`history_event`.`idEvent`,
`history_event`.`created_at`,
`history_event`.`processed_at`,
`history_event`.`data`,
FROM `history_accesshistory` LEFT OUTER JOIN `history_event` ON (`history_accesshistory`.`event_id` = `history_event`.`idEvent`)
EDIT:
@Paul Spiegel LEFT JOIN is not required but I don’t know how to avoid it using QuerySet API.
3
Answers
Thanks to @Rick James and @Paul Spiegel for pointing me in the right direction. The problem was the
blank=True
andNull=True
in event field.Now with
It works as expected:
What you may be running into according to the docs:
models.DateTimeField(db_index=True)
may be setting an ASC index, but you’re querying for the reverse. Note that Django 5.0 docs are currently suggesting to useMeta.indexes
instead ofField.db_index
, too.Try adding the index like this:
QuerySet.explain()
may be helpful in validating which indexes are actually being used by your queries as well.As written, both of your queries will involve
accesshistory
.event
, preferably viaINDEX(idEvent, created_at)
. (Add that if you don’t already have it.)created_at
.LIMIT
).On the other hand, if there is exactly one "event" for each "accesshistory" row, change
LEFT OUTER JOIN
toJOIN
. This will allow the MySQL Optimizer to "start" withevent
. That may avoid the "sort" thatORDER BY
needs). In this case, it might scan throughevent
in reverse order usingINDEX(created_at ASC)
. For each row, it would then look up the item(s) inaccesshistory
.