Ubuntu - Serialize SqlAlchemy query across process boundary - PhpOut

user124114
January 2, 2024
224 views
0 votes
2 Answers

Using Python==3.10.12 and sqlalchemy==2.0.23 (and postgres 12 under Ubuntu, if that makes any difference):

How do I serialize an existing SqlAlchemy query so I can deserialize it and fetch its results in another process?

In other words, I have a SqlAlchemy query in process=1, but want to run for result in query.yield_per(1000): … and query.count() in process=2.

I’m not looking to serialize the query results. I’m looking to serialize & deserialize the query itself.

It is essential the result set is never buffered/materialized at any point, because it’s too large for RAM.

Tags: orm python sqlalchemy

Answers

Chosen as BEST ANSWER
- user124114
- January 1, 2024 at 11:46 pm
- 0 votes
0
In process=1
```
from sqlalchemy.dialects import postgresql

# Serialize.
query_sql = str(query.statement.compile(dialect=postgresql.dialect(), compile_kwargs={"literal_binds": True}))
```
In process=2
```
from sqlalchemy import text, select

# Deserialize.
# We use `from_statement` to yield full ORM objects, rather than
# just Row tuples via `session.execute(text(query_sql))`.
objects = session.execute(
    select(MyTable).execution_options(yield_per=100).from_statement(text(query_sql)),
    execution_options={'yield_per': 100},
).scalars()

# Server-side cursor over the (large) query results.
for obj in objects.yield_per(100):
    …
```
This solution is not 100% because objects is of type sqlalchemy.engine.result.ScalarResult which has no objects.count() like the original query.count() does. There's no objects.rowcount either. How do I get the result count out efficiently (server-side query)?

One option is fetching the result count in process=1 and passing it to process=2 as an int, alongside the query_sql string. But that creates a race condition (count vs query out of sync), so I'd prefer to fetch them both in the same session.

(Edit)

- IanWilson
- January 1, 2024 at 11:22 pm
- 0 votes
0
Usually people would want to split out sets of results into child processes and act on those results. Either by passing on offset/limit windows or batches of result ids. I guess I don’t understand the use-case. You can try this old serializer but I think the whole concept is going to be difficult to maintain and debug, ie. Pickling and unpickling sqlalchemy table objects and all the associated machinery.

Deprecated Extension

discussion about future of extension

Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.