skip to Main Content

I noticed that a query that used to be fast in legacy version of Django is now much slower in 4.0.8.

There is a fairly large table with a FK ‘marker’ and a boolean ‘flag’ that has an index attached. The following queries could reasonably return tens of thousands of rows.

In my codebase, there is a query like

MyModel.objects.filter(marker_id=123, flag=False).count()

In Django Debug Toolbar (and also in shell when I examine str(qs.query)) it now resolves to the following SQL syntax:

SELECT ••• FROM `myapp_mymodel` WHERE (`myapp_mymodel`.`marker_id` = 123 AND NOT `myapp_mymodel`.`flag`)

In extreme cases, this query runs for 20s or so. Meanwhile, in legacy Django version (1.11+) the same query becomes the following SQL:

SELECT ••• FROM `myapp_mymodel` WHERE (`myapp_mymodel`.`marker_id` = 123 AND `myapp_mymodel`.`flag` = 0)

This works, since the table schema contains ‘flag’ as a TINYINT(1), but most importantly, it works much faster – returning in under a second.

EDIT: I asked sql server to EXPLAIN both queries, and there is a difference in ‘flag’ appearing as a potential key in the latter (faster) query but not in the slower one. This is consistent with this answer stating that mysql needs to see comparison against a value to know to use an index. Thus, the main question becomes, how can I enforce the syntax that makes use of the index already in place?

END EDIT

Original questions: Why is the difference in ORM-to-SQL translation, and where can I find the code responsible (I have checked db.backends.mysql to no avail, or failed to recognize the culprit)? Is there a way to hint to Django that I’d much prefer the equals-zero behaviour?

The only workaround I see so far is to use raw SQL query. I’d rather avoid that if possible.

2

Answers


  1. This is a regression already reported and resolved in Django’s issue tracker being issue #32691.

    This is fixed in Django 4.1 so if you update it will get resolved automatically. For versions 3.2 to 4.0 you can use the workaround noted by Todor Velichkov on the issue which is to use Value() expressions:

    from django.db.models import Value
    
    
    MyModel.objects.filter(marker_id=123, flag=Value(0)).count()
    
    Login or Signup to reply.
  2. (MySQL/MariaDB, using ENGINE InnoDB)

    It depends on what the index(es) are.

    INDEX(flag)
    

    is likely to be used only if less than 20% of the rows match. (The 20% is approximate.)

    INDEX(flag, ...)
    

    where the extra columns can be used for filtering, is likely to be used for either TRUE or FALSE.

    Note that, in InnoDB, the PRIMARY KEY's column(s) are implicitly tacked onto the end of the index. This turns INDEX(flag) into INDEX(flag, id) possibly allowing the second form above to be relevant.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search