skip to Main Content

When a queryset is sorted by a field in a related model performance decreases drastically. I use mysql.

For example, I have two models:

class Event(models.Model):
    idEvent = models.BigAutoField(primary_key=True)
    created_at = models.DateTimeField(db_index=True, verbose_name=_('date'))
    processed_at = models.DateTimeField(auto_now_add=True, verbose_name=_('processed'))
    data = models.TextField()

    class Meta:
        ordering = ["-created_at"]

# [1154519 rows]
class AccessHistory(models.Model):
    event = models.ForeignKey(Event, on_delete=models.CASCADE, blank=True, null=True)
    result = models.TextField(verbose_name=_('result'))

# [1130603 rows]

If I do AccessHistory.objects.all().select_related('event').order_by('-event__created_at') the query delays over 5s, if I swap select_related with prefetch_related I get the same delay. When I do the query without ordering it responds in the expected time (<1s)

-- AccessHistory.objects.all().select_related('event').order_by('-event__created_at')
SELECT `history_accesshistory`.`id`, 
`history_accesshistory`.`event_id`,  
`history_accesshistory`.`result`, 
`history_event`.`idEvent`, 
`history_event`.`created_at`, 
`history_event`.`processed_at`, 
`history_event`.`data`
FROM `history_accesshistory` LEFT OUTER JOIN `history_event` ON (`history_accesshistory`.`event_id` = `history_event`.`idEvent`) 
ORDER BY `history_event`.`created_at` DESC
-- AccessHistory.objects.all().prefetch_related('event').order_by('-event__created_at')
SELECT `history_accesshistory`.`id`, 
`history_accesshistory`.`event_id`, 
`history_accesshistory`.`result` 
FROM `history_accesshistory` LEFT OUTER JOIN `history_event` ON (`history_accesshistory`.`event_id` = `history_event`.`idEvent`) 
ORDER BY `history_event`.`created_at` DESC

I tried with select_related, prefetch_related, indexing created_at field in Event model and setting a default order in Event model. Nothing of this improves the response time.

How I can optimize this without moving/copying created_at field to AccessHistory model?

EDIT 1:
I use this queryset in django-tables2 view. This is why there is no limit.

EDIT 2:
@Rick James thanks for the answer. Now I understand the problem but I don’t found how to solve it. The relation is not one to one, every AccessHistory has an Event but there are Events without AccessHistory. With INNER JOIN or JOIN it works as expected but I don’t know how to do the query with django ORM.
I tried with AccessHistory.objects.select_related('event') but it uses a LEFT OUTER JOIN:

SELECT `history_accesshistory`.`id`, 
`history_accesshistory`.`event_id`, 
`history_accesshistory`.`result`, 
`history_event`.`idEvent`, 
`history_event`.`created_at`, 
`history_event`.`processed_at`, 
`history_event`.`data`, 
FROM `history_accesshistory` LEFT OUTER JOIN `history_event` ON (`history_accesshistory`.`event_id` = `history_event`.`idEvent`)

EDIT:
@Paul Spiegel LEFT JOIN is not required but I don’t know how to avoid it using QuerySet API.

3

Answers


  1. Chosen as BEST ANSWER

    Thanks to @Rick James and @Paul Spiegel for pointing me in the right direction. The problem was the blank=True and Null=True in event field.

    Now with

    class AccessHistory(models.Model):
        event = models.ForeignKey(Event, on_delete=models.CASCADE, blank=False, null=False)
        result = models.TextField(verbose_name=_('result'))
    

    It works as expected:

    -- AccessHistory.objects.all().select_related('event').order_by('-event__created_at')
    SELECT `history_accesshistory`.`id`, 
    `history_accesshistory`.`event_id`, 
    `history_accesshistory`.`result`, 
    `history_event`.`idEvent`, 
    `history_event`.`created_at`, 
    `history_event`.`processed_at`, 
    `history_event`.`data`
    FROM `history_accesshistory` INNER JOIN `history_event` ON (`history_accesshistory`.`event_id` = `history_event`.`idEvent`) 
    ORDER BY `history_event`.`created_at` DESC
    

  2. What you may be running into according to the docs:

    By default, indexes are created with an ascending order for each column.

    models.DateTimeField(db_index=True) may be setting an ASC index, but you’re querying for the reverse. Note that Django 5.0 docs are currently suggesting to use Meta.indexes instead of Field.db_index, too.

    Try adding the index like this:

    class Event(models.Model):
        ...
    
        class Meta:
            indexes = [
                models.Index(fields=['-created_at'])
            ]
    

    QuerySet.explain() may be helpful in validating which indexes are actually being used by your queries as well.

    Login or Signup to reply.
  3. As written, both of your queries will involve

    1. A full scan of accesshistory.
    2. Reach into event, preferably via INDEX(idEvent, created_at). (Add that if you don’t already have it.)
    3. Sort by created_at.
    4. Deliver all the data (or less if you have LIMIT).

    On the other hand, if there is exactly one "event" for each "accesshistory" row, change LEFT OUTER JOIN to JOIN. This will allow the MySQL Optimizer to "start" with event. That may avoid the "sort" that ORDER BY needs). In this case, it might scan through event in reverse order using INDEX(created_at ASC). For each row, it would then look up the item(s) in accesshistory.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search