My understanding of containerized applications is that each container should be an ephemeral unit that can and will be deprovisioned and reprovisioned unexpectedly, and should thus not be used for stateful purposes.
However when looking into spinning up an application using Docker-Compose, it seems as though every example I see seems to have some sort of PostgreSQL or Redis service as part of its configuration.
Apologies if this is not the best way to frame this question, but what is the overall best practice on this? To containerize the database and presumably persist data from the container to the host’s disk, or to outright run the database on the host and connect from the application container to the host?
2
Answers
Containerizing a database and using it as a microservice are different ideas even though they get conflated. A container is in many ways a sibling to a virtual machine. It’s fine to containerize a database but definitely* don’t make it a microservice.
*Unless you know why you need it to be a microservice.
It is very reasonable to run stateless workloads in containers, and have a separate non-containerized database on bare metal or a hosted cloud service (e.g., AWS RDS), especially in production.
As you note, the lifecycle of databases and containers are very different. Databases must be backed up and available, and for traditional relational databases it’s tricky to get more replication than an active/standby pair and maybe a passive read replica. Containers are intended to be freely destroyed and replaced, and typical HTTP- or queue-based microservices can easily run multiple replicas.
If you look around Stack Overflow questions you’ll see a lot of things that don’t match production setups. (The biggest one is using
volumes:
for a live-development environment; production container-oriented deployments should just be able todocker run
an image out of a repository without having any of the source code or assets locally.) A local database setup fits this category, but it’s a good idea for a development environment: you can use a database per project and they will be isolated from other databases, and it’s easy to reset if something goes really wrong.