Using Python==3.10.12
and sqlalchemy==2.0.23
(and postgres 12 under Ubuntu, if that makes any difference):
How do I serialize an existing SqlAlchemy query so I can deserialize it and fetch its results in another process?
In other words, I have a SqlAlchemy query
in process=1, but want to run for result in query.yield_per(1000): …
and query.count()
in process=2.
I’m not looking to serialize the query results. I’m looking to serialize & deserialize the query itself.
It is essential the result set is never buffered/materialized at any point, because it’s too large for RAM.
2
Answers
In process=1
In process=2
This solution is not 100% because
objects
is of typesqlalchemy.engine.result.ScalarResult
which has noobjects.count()
like the originalquery.count()
does. There's noobjects.rowcount
either. How do I get the result count out efficiently (server-side query)?One option is fetching the result count in process=1 and passing it to process=2 as an
int
, alongside thequery_sql
string. But that creates a race condition (count vs query out of sync), so I'd prefer to fetch them both in the same session.Usually people would want to split out sets of results into child processes and act on those results. Either by passing on offset/limit windows or batches of result ids. I guess I don’t understand the use-case. You can try this old serializer but I think the whole concept is going to be difficult to maintain and debug, ie. Pickling and unpickling sqlalchemy table objects and all the associated machinery.
Deprecated Extension
discussion about future of extension