skip to Main Content

I’m trying to execute the following query using sqlalchemy:

query = session.query(
                func.count(ExampleTable.id)
            ).filter(ExampleTable.environment == self.environment))
if ids_list:
    query = query.filter(ExampleTable.id.in_(ids_list))

I noticed it was slow, so i did a simple timing "benchmark":

start = perf_counter()
query.first()
print(format(perf_counter() - start, '.3f'))  # .3f to limit decimal cases
  • It does not have to be precise, it is just for debugging purposes.

The average result of this perf_counter is always close to 2.7 seconds.

I also thought it could be the ExampleTable.environment == self.environment, since it is a string comparison, but the average result was around 0.03s, executing the same "benchmark" above with only the environment comparison.

  • ExampleTable has 150k ~ 200k total entries.
  • ids_list has 22k integer values.

With all that in mind, i did the same queries on MySQL Workbench (using the __ str __ method from Query object):

SELECT 
    COUNT(example_table.id) AS count_1
FROM
    example_table
WHERE
    example_table.environment = 'production'
        AND example_table.id IN (775008, 790152, 793346, ...) -- 22k ids

The first execution, which is usually slower, took 0.110 sec / 0.000 sec (duration / fetch)

The next executions took around 0.030 ~ 0.040 / 0.000 (duration / fetch)

SELECT
    COUNT(example_table.id) AS count_1 
FROM
    example_table 
WHERE
    example_table.environment = 'production';

First execution: 0.016 sec / 0.000 sec (duration / fetch)

Next executions: 0.000 / 0.000 (duration / fetch)

Since i’m running the same query on sqlalchemy and workbench, the duration should be at least close, right?

The difference between 2.7s~ to 0.100s~ is a LOT.

  1. Am i doing something wrong here? On python or mysql
  2. If the question lacks information, please let me know.

Edit1: The echo=True shows not exactly what i’d expect:

This is the first log shown about the query i’m debugging:

2024-07-15 12:13:58,594 INFO sqlalchemy.engine.Engine SELECT count(example_table.id) AS count_1 
FROM example_table
WHERE example_table.environment = %(environment_1)s AND  example_table.id IN (%(id_1_1)s, %(id_1_2)s, %(id_1_3)s, ..., (id_1_21800)s, %(id_1_21801)s) 
 LIMIT %(param_1)s

All these references seem odd %(id_1_1)s, %(id_1_2)s, ...

And aparently the second one defines the references:

2024-07-15 12:46:10,584 INFO sqlalchemy.engine.Engine [generated in 0.01642s] {'id_1': 1, 'environment_1': 'production', 'param_1': 1, 'id_1_1': 335469, 'id_1_2': 412304, 'id_1_3': 468745, ..., 'id_1_21800': 365095, 'id_1_21801': 731841}

Could be this slowness from these defined variables? Its one per id existing in .in_ clause


2

Answers


  1. Chosen as BEST ANSWER

    The problem was not the .in_ itself, but with the engine url i was using:

    url_obj = URL.create('mysql+mysqlconnector', username=username, password=password, host=host, database=database)
    

    Just changed it to mysql instead of mysql+mysqlconnector

    url_obj = URL.create('mysql', username=username, password=password, host=host, database=database)
    

    This made the query time drop from ~2.7s to ~0.09s


  2. I agree even taking into account any python string interpolation or something 2.7 seconds is a very large magnitude slower, 27 times.

    A few things to try:

    1. Make sure echo=True is on for the engine and check that the SQL being submitted is actually what you expect. (ie. a single query and not multiple queries, no other queries are firing, the SQL looks like what you expect)
    2. Copy the text manually into a python string outside the perf window and wrap it in text() and pass it to sqlalchemy
    3. Use that same string and pass it directly to the underlying driver connection (bypassing sqlalchemy). I don’t know the exact usage to fetch a result but this is a good starting place without needing to workup setting up a driver connection manually: working-with-driver-sql-and-raw-dbapi-connections
    4. Are you connecting over network versus the bench client is on the same server as the db itself?
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search