I’ve been studying AWS ECS, but I cannot understand how to architect the following scenario:
I’m trying to run hundreds of algorithms at once in real-time:
- Each algorithm needs to run in a single isolated container
- Algorithms share the same common code / environment setup and only one file is different in each algorithm (let’s call it the "algo-file") which contains some custom code that differs from all other algorithms
- Algorithms are long running – they should be perpetually live until they get a termination signal from a main instance that I have built
- If an algorithm’s container fails, the ECS should replace it with a healthy container
- No two "algo-files" are same across algorithms, even though all other files are the same
- I keep track of which "algo-files" have been deployed (and which have not) in a central database
Each container should also be able to interact with a common database and call external APIs and receive requests from internal EC2 instances.
(EDITED)
Any suggestions on how this may be architected?
2
Answers
First, disclaimer: I think that containers are solution in search of a problem; so, I wouldn’t start with ECS to begin with.
What I would do is to launch EC2 instances
aws ec2 run-instances --count 100
and inuser-data
have some call to a service, saycurl https://example/gen-algofile/ -o /var/opt/algo-file
with the service figuring out which algofile to send. Probably tracking a list of files that have already been sent. You need to give more information how you track these algofiles. The service could be API Gateway or even external (non-AWS) serviceUPDATE: your updates to the question make it much more confusing (no wonder, somebody already suggested closing). Your question is already on thin ice for off-topic (strictly speaking, it is not code-related) and you are only making it worse by piling up terms like "services", "tasks", "API" etc.
I answered the first straightforward question – "How do I run multiple instances that are almost the same"; you may want to revert to that simple question.
You could probably do a bunch of ECS run-task commands, with different environment variables in each command, specifying which algo file to use for each task. And code/configure your docker containers in such a way that they keep running until a stop command is sent to each one. There would be no ECS service here, each task would just be a separate instance. In this scenario, you don’t get the "If an algorithm’s container fails, the ECS should replace it with a healthy container" feature.
If you wanted to run all these as a single ECS service, it would be more complicated. The ECS service is going to run N identical tasks. You would have to have some sort of external orchestration thing, maybe as simple as a DynamoDB table, that each task in the service connects to at startup and picks an algorithm to use, that hasn’t already been chosen by any other task. That’s just a very high-level suggestion to give you an idea of how you would have to build some of this yourself, since an ECS Service does not support what you are trying to do directly out of the box.