Docker - Sharing data schemas between distributed systems

oweydd
December 4, 2024
66 views
0 votes
2 Answers

I have a project that involves two docker containers, one a web server that interacts with a client, and another a worker server that does stuff in the background. The web server sends data over to the worker via Redis. The project structure looks like this:

.
├── docker-compose.yml
├── web_server
│   └── app
│       ├── Dockerfile
│       ├── app
│       │   ├── main.py
│       │   ├── schemas
│       │   └── other_stuff
│       ├── requirements.txt
│       └── scripts      
│          
└── worker
    └── app
        ├── Dockerfile
        ├── app
        │   ├── run_worker.py
        │   └── other_stuff
        ├── requirements.txt
        └── scripts

I am using Pydantic models defined in web_server/app/schemas/ to validate requests and responses from the web server. Those requests are turned in to dictionaries when they’re pulled from the Redis database, some fields are deleted and added to the dictionaries in the processing that the worker does, and then finally the web server takes the processed data from the Redis database and puts them into a Pydantic model called ResponseModel to give back to the client. I’m wondering if there’s a better way to do this. Perhaps using the schema files in the worker server too so that I have type hints and there’s less chance of trying to access fields that don’t exist or creating fields that aren’t in the ResponseModel.

Answers

- nerdstrike
- October 10, 2024 at 3:53 pm
- 0 votes
0
One safe way to make the client benefit from the models of the server is to embrace the OpenAPI aspect of fastapi. That way the client is not tightly coupled to the correct version of the server code (and does not have to be Python). The generated client has all the properties but probably not the validation aspects.

There are tools that can generate client classes/libraries from the OpenAPI (or swagger) spec that fastapi automatically offers on an endpoint, and the FastAPI docs mention some.

The drawback is that python community support for openapi stuff is a bit sparse, and you really do need to start versioning your URLs so that your client can tell/survive when the server has diverged from the client expectation. In the single-repo scenario above you also get concept duplication.

The simplest approach is what you already propose. The worker imports the same pydantic models and push the requests and responses into the same models thereby providing client-side validation albeit with overhead. Tightly coupled and single ecosystem, but there’s no code generation to do.

Login or Signup to reply.

- dilzio
- December 4, 2024 at 2:25 pm
- 0 votes
0
Consider a language agnostic format like protobuf, avro, or thrift. For starters you can have a shared repo with the schema and then import it as a git submodule to any service that requires it. You’ll need a strategy for handling schema evolution to ensure that there are not breaking changes to the shared schema.

If you want something more sophisticated take a look at Buf.build and their tooling (BSR, Buf CLI) that facilitates sharing message formats at scale.

Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Docker – Sharing data schemas between distributed systems

Answers