skip to Main Content

Using Python==3.10.12 and sqlalchemy==2.0.23 (and postgres 12 under Ubuntu, if that makes any difference):

How do I serialize an existing SqlAlchemy query so I can deserialize it and fetch its results in another process?

In other words, I have a SqlAlchemy query in process=1, but want to run for result in query.yield_per(1000): … and query.count() in process=2.

I’m not looking to serialize the query results. I’m looking to serialize & deserialize the query itself.

It is essential the result set is never buffered/materialized at any point, because it’s too large for RAM.

2

Answers


  1. Chosen as BEST ANSWER

    In process=1

    from sqlalchemy.dialects import postgresql
    
    # Serialize.
    query_sql = str(query.statement.compile(dialect=postgresql.dialect(), compile_kwargs={"literal_binds": True}))
    

    In process=2

    from sqlalchemy import text, select
    
    # Deserialize.
    # We use `from_statement` to yield full ORM objects, rather than
    # just Row tuples via `session.execute(text(query_sql))`.
    objects = session.execute(
        select(MyTable).execution_options(yield_per=100).from_statement(text(query_sql)),
        execution_options={'yield_per': 100},
    ).scalars()
    
    # Server-side cursor over the (large) query results.
    for obj in objects.yield_per(100):
        …
    

    This solution is not 100% because objects is of type sqlalchemy.engine.result.ScalarResult which has no objects.count() like the original query.count() does. There's no objects.rowcount either. How do I get the result count out efficiently (server-side query)?

    One option is fetching the result count in process=1 and passing it to process=2 as an int, alongside the query_sql string. But that creates a race condition (count vs query out of sync), so I'd prefer to fetch them both in the same session.


  2. Usually people would want to split out sets of results into child processes and act on those results. Either by passing on offset/limit windows or batches of result ids. I guess I don’t understand the use-case. You can try this old serializer but I think the whole concept is going to be difficult to maintain and debug, ie. Pickling and unpickling sqlalchemy table objects and all the associated machinery.

    Deprecated Extension

    discussion about future of extension

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search