Docker is a controversial technology. Depending on who you talk to it's either the future of software, available now or a rickety half-baked system which causes nothing but problems. The truth is a little more nuanced. Docker does a lot of things right, but it gives you the freedom to shoot yourself in the foot.
Something everyone can agree on is the difficulty a new Docker user faces when it comes to figuring out best practices. A lot of people keep running into the same problems.
We've been using containers for deployment of Node.js and PHP applications at my agency, Web Artisans, for the last 12 months. Docker has changed the way we ship software, but it hasn't always been smooth sailing. We've made our mistakes, learned from them and this is my effort to put together the key dos and don'ts of running Docker containers in production.
Let's start with a few principles which will make your first production Docker application more manageable.
Containers wrap processes
It's easy to get carried away when building a Docker image. You build an image based on ubuntu:trusty and go nuts with apt-get installing everything your service needs to interact with the outside world.
Before you know it, you've packaged up nginx, php-fpm, redis, memcached and mysql into a massive container image which takes 20 minutes to build and you're wondering what all the hype is about. Wasn't one of the promises of modern web dev a tighter feedback loop than ever before?
Containers should be more granular. You're shipping execution environments for individual processes, not virtual machine images. Have a container for your application, a container for redis and a container for nginx. Containers are lightweight. You don't need to bundle everything into one image.
Ship your application code in a container
Mounting your application code as a volume in a container with the Node or PHP runtime is great for development and making changes on-the-fly, but it's a pain when you go to deploy your code in production. You can't easily take advantage of orchestration benefits like rolling updates or canary releases and it's hard to tell which version of your app a given container is running if you're relying on volumes.
Build and tag your app container with the version contained within and enjoy easy deployments and predictable behaviour.
Keep your images small
One major concern when developing with Docker is managing your library of images. Pull in Ubuntu and you're already looking at a 300-400mb container just to ship 1-2mb of your application code. Account for a few dependencies, a new container build a few times per week and you end up staring down the barrel of a low disk space warning very quickly. Storage aside, it's never fun waiting for a massive image to pull whenever you want to spin up your app on a new machine.
Alpine Linux is a dramatically more lightweight Linux distribution which gained tremendous popularity with Docker users. Alpine's image is a mere 5mb and Alpine-based images can be measured in the tens of megabytes, instead of hundreds. Alpine comes with a full-featured package manager and shell. If you can handle learning some new commands Alpine will put your images on a serious diet.
Most popular official images also offer Alpine-based variants. Check the tags for your favourite images to see if you can slim down your dependencies too.
Containers are disposable
Do not use containers for data persistence in production.
Let me repeat this, because it is critical.
Do not use containers for data persistence in production. If you use containers for data persistence you will have a bad time.
There are a number of reasons for this which I'll delve into in a future article, but essentially it boils down to three things:
- Containers are great because they can be quickly spun up or down,
- Services which scale horizontally are great candidates for containerization. Services which scale vertically (like databases) are not, and
- There can be weird I/O and latency issues with container-to-container networking and volumes which isn't great for database use.
Database containers are great for development. It's awesome being able to spin up a disposable DB container in a CI environment. In production, you should only deploy containers which can safely be terminated and their filesystem wiped at any time, without warning - application containers.
Use an external database server with fail-over. Store files in S3 or Google Cloud Storage. The only data persisted inside a container should be an ephemeral application cache which can be safely lost at any time.
Containerization and Orchestration are different problems
Docker tries to do too much. As a container solution it is perfectly fine, but orchestration of a fleet brings with it a boatload of new challenges. Docker Compose is fine for spinning up a quick-and-dirty development or CI environment, but it's lackluster when it comes to building out production infrastructure and coordinating deployments. Docker Swarm is immature and lacks critical features like rolling updates that you get for free with a (relatively) mature orchestration solution like Kubernetes.
We've been running a few Kubernetes clusters for the past year and we've had a lot of success doing so. The command line management tool is extremely powerful and the declarative manifests make it obvious where all of your configuration values are coming from. The Secrets API also gives you a convenient way to store sensitive data for your application outside source control.
There are other providers of orchestration software (like Amazon Container Service,) however we've had a great experience with Kubernetes and will continue down that path for the time being.
Buy into an ecosystem
It's possible to spin up a few VMs on Digital Ocean and bootstrap your own Docker registry and Kubernetes cluster for your production deployment, but that provides a whole new set of challenges. Networking and container orchestration are the hardest parts of getting a new deployment off the ground.
The Kubernetes project has been working hard to make this process easier in their newest versions, however buying into an established provider and their ecosystem is the quickest way to get your application into production.
Google Container Engine (GKE) is a mature provider of hosted Kubernetes. There are a few other providers out there, but we've found Google's implementation to be the most stable and seamless solution to date.
This is the first in a number of container-related articles I'm planning to write. I'd be interested to hear your experiences using Docker and various orchestration solutions.