I noticed that a query that used to be fast in legacy version of Django is now much slower in 4.0.8.
There is a fairly large table with a FK ‘marker’ and a boolean ‘flag’ that has an index attached. The following queries could reasonably return tens of thousands of rows.
In my codebase, there is a query like
MyModel.objects.filter(marker_id=123, flag=False).count()
In Django Debug Toolbar (and also in shell when I examine str(qs.query)
) it now resolves to the following SQL syntax:
SELECT ••• FROM `myapp_mymodel` WHERE (`myapp_mymodel`.`marker_id` = 123 AND NOT `myapp_mymodel`.`flag`)
In extreme cases, this query runs for 20s or so. Meanwhile, in legacy Django version (1.11+) the same query becomes the following SQL:
SELECT ••• FROM `myapp_mymodel` WHERE (`myapp_mymodel`.`marker_id` = 123 AND `myapp_mymodel`.`flag` = 0)
This works, since the table schema contains ‘flag’ as a TINYINT(1), but most importantly, it works much faster – returning in under a second.
EDIT: I asked sql server to EXPLAIN both queries, and there is a difference in ‘flag’ appearing as a potential key in the latter (faster) query but not in the slower one. This is consistent with this answer stating that mysql needs to see comparison against a value to know to use an index. Thus, the main question becomes, how can I enforce the syntax that makes use of the index already in place?
END EDIT
Original questions: Why is the difference in ORM-to-SQL translation, and where can I find the code responsible (I have checked db.backends.mysql to no avail, or failed to recognize the culprit)? Is there a way to hint to Django that I’d much prefer the equals-zero behaviour?
The only workaround I see so far is to use raw SQL query. I’d rather avoid that if possible.
2
Answers
This is a regression already reported and resolved in Django’s issue tracker being issue #32691.
This is fixed in Django 4.1 so if you update it will get resolved automatically. For versions 3.2 to 4.0 you can use the workaround noted by Todor Velichkov on the issue which is to use
Value()
expressions:(MySQL/MariaDB, using ENGINE InnoDB)
It depends on what the index(es) are.
is likely to be used only if less than 20% of the rows match. (The 20% is approximate.)
where the extra columns can be used for filtering, is likely to be used for either TRUE or FALSE.
Note that, in InnoDB, the
PRIMARY KEY's
column(s) are implicitly tacked onto the end of the index. This turnsINDEX(flag)
intoINDEX(flag, id)
possibly allowing the second form above to be relevant.