The 12 Factor Container

Tim Bannister
6 July, 2020

Please note that this post, first published over a year ago, may now be out of date.

Both Docker and the 12 Factor App methodology date from 2011. The big innovation that Docker provided was less about its code and more its way of thinking about containers and the images they’re based on. I’m going to look at how well these two ideas work with each other.

When Adam Wiggins (who founded Heroku) published that 12 Factor methodology, it was aimed squarely at SaaS app development and operations. At the time, DevOps was a hot buzzword; twelve-factor offered a single set of qualities that were relevant both to coding and to running the app.

At the same time in 2011, Docker wasn’t even a buzzword. There was a company, dotCloud, who made a tool (dc) that you could use to create images, push them, pull them, and run them. Linux containers in this form had been around for around a decade already, starting with the vserver patches that I remember relying on in my first ever IT job.

At this point, Heroku had a thing like container images (“slugs”), whereas dotCloud was doing in-place deploys. Bear in mind that I didn’t use dotCloud myself in the early days, so I’m relying on the firm’s own history and on web pages from the time. Anyway, fast forward to the present day and imagine a contemporary, containerised, cloud-native workload.

I’m going to run through those twelve factors one by one and talk about how that workload might fit into that.

1. Codebase

One codebase tracked in revision control, many deploys. It’s so simple, right?

OK, so here’s my first wrinkle. Let’s say you’re using Docker to build a container, and your Dockerfile looks something like:

FROM python:3.7
COPY . /app
RUN pip install exampledependency
CMD [ "python", "/app/app.py" ]

Standard, right? But you don’t have one codebase, not quite. You have the codebase for your app and you have the codebase for the exampledependency package, and you have the codebase for the python base image. Back in 2011 it was the norm to think about code as being linked to apps, without worrying much about source code for the system the apps ran on top of. In fact, Heroku ran a business based on telling you not to worry.

It’s still good advice. Your codebase definitely should live in revision control. You shouldn’t have a repo per customer, nor a development branch that’s a long way from production, and you shouldn’t treat deploying changes as source of fear. Plus, using something like a Dockerfile lets you…

2. Dependencies

Explicitly declare and isolate dependencies. This is the real strength of using containers. Your app has a bunch of dependencies that all live in layers 1 through n of your built image. Everything after that comes from your codebase. Let’s specially call out one bit of explicit declaration, too: the latest tag. If that Dockerfile had used python:latest, my app could be in for a world of trouble if and when Python 4 comes out. With tags for your base image, be as precise as you need to be.

So you’re covered? Well, maybe. Are you relying on tools like curl or cfssl that come as part of the base image as a side effect of how it was built? You shouldn’t. To tick this item off your list, either create your own base image that explicitly adds those tools, or alter your app to achieve the same thing in code.

3. Config

Wiggins’ next nugget of advice is to store config in the environment. This one is a big part of how people really get value from containers and a cloud-native philosophy. If you’re running containers in Kubernetes, you use a ConfigMap (or a Secret) to copy information into environment variables, and you definitely don’t bake confidential data into your container images.

If you’re using a sidecar or an init container to inject application settings, you’re falling short for this factor. You could rewrite your app to load data specifically from its environment variables and pay down some tech debt.

4. Backing services

The fourth tip is to treat backing services as attached resources. In the cloud-native world, Kubernetes has this philosophy baked in. Inside your cluster, even the Kubernetes API is exposed as a Service (in the Kubernetes sense) and published for your Pods to use. Containers provide isolation and they let you write code that doesn’t know, let alone care, whether the service it’s interacting with is a sidecar, a StatefulSet, something outside the cluster, or a debug version running on your laptop. It’s strongly held opinion and one that containers help you achieve.

5. Build, release, run

Continuing, we come to strictly separate build and run stages. You might think this one comes with the territory. Absolutely it should.

You can’t assume that because someone is using containers that they’re doing it the right way. I’ve seen teams looking to run configuration management in containers, even using container services to implement durable virtual machines (if you want that, maybe look at KubeVirt).

To take this philosophy to heart, consider making a multi-stage build for your app. Once you’ve compiled or produced the artifacts that need to live in the running container, place just what’s needed into a new image and hand that off for testing and release. If you’re writing in a language like Rust that produces binaries, your final image might have the one binary in it, and nothing else at all.

The point I want to make here is that, actually, it’s easy to fall short on this factor. If you’re reviewing legacy code or migrating it to containers, watch for snags here.

6. Processes

(Execute the app as one or more stateless processes).
On the 12 factor side, there’s a clear steer that you’re making a web app. Making a web app’s implementation be stateless fits well with the matching architectural style from REST: the interactions between client and server are stateless. Your 12 factor app becomes an adapter between the client and other service(s).

The cloud native mindset has strong views here too, and the two are much more similar than different. Twelve factor asks you treat the local storage as ephemeral; cloud-native recommends making it immutable. There’s differences, though. The obvious one is that the cloud-native approach is to wrap each of those stateless processes in its own container. Also relevant though are the persistence layers. 12 factor apps rely on external services they can take for granted — and that’s a good approach. The cloud native approach and ecosystem is broad enough to include the backends too. If you need your service to offer persistence to its clients, somewhere there needs to be storage and state.

Driving that insistence, back in 2011, was the idea of apps saving session state locally. In 2020 it’s common for apps to shift most or even all of their session state to the client, with zero lines of your own code involved in authenticating clients, persisting app data between HTTP requests, or handling logouts.

In a world where there’s no session state to share, the containers that provide an API to the client-side code get an easy ride. If your app works with shared-nothing, stateless containers, that makes related services such as load balancing much more easy to design, implement and operate.

If you need to include users (and other clients) that can’t or don’t run JavaScript, you can implement server-side rendering: the client provides context (such as cookies) in each request, and the web app queries a backend service to load the state based on the context. You can do this in containers just as easily as not.

Following on are two related points. The first:

7. Port binding

Export services via port binding; this might surprise people who are relatively new to IT. Having components listen on a socket is not just common, it’s already a norm.

Rewind to 2011 and it was a different story: application servers still implemented their own IPC protocols and systems operators integrated the webserver with the delivered application. Twelve-factor served as a manifesto for a new approach.

…and that leads into the next item:

8. Concurrency

Scale out via the process model. With containers you add capacity by running more containers. Depending on how you host those containers you might actually add replica Pods, or replica ECS Tasks, etc. It doesn’t matter; these really are different views of the same core idea.

What’s crucial in both the container and the 12 factor way of thinking is that the application doesn’t detach and become a daemon. It runs in the foreground and relies on something external to supervise it. Well before people were talking about entrypoints and not needing to place an init inside a container, the 12 factor approach was already telling you to let another tool supervise your application.

It’s perhaps worth pointing out that just deploying your app inside a container does not guarantee there’s only one running process.
Phusion’s Hongli Lai wrote a good article: Docker and the PID 1 zombie reaping problem, back in 2015, that tells you how to spot whether you’re doing it right, with tips for you if you aren’t.

Ok, we’re three quarters done. These last three are all solid advice.

9. Disposability

(Wiggins is for it). No surprise there, I hope.

Your app should expect to be started—and stopped—without notice. The specific counsel for this point is to maximize robustness with fast startup and graceful shutdown, and I think the details are worth reading through.

Containers aren’t inherently fast at starting up; in fact, compared to bare processes the container runtime always adds some overhead. Where I think containers shine is in revealing the hold-up. Compare a container task startup to running on a cloud virtual machine. Even with API-based provisioning and prepared, application-specific images, the virtual machine approach is likely to take tens of seconds to start up from its own overhead. Container runtimes typically add 100ms or less, so a long startup has to be down to the application code.

I’ve worked with clients whose VMs take over an hour to go into service, and I know that firms who try shifting something like that into a container are some good way short of true disposability.
You need to be able to shut down half (or, actually, all) of your app instances, at least as a thought experiment, and have reasonable confidence that the service comes back on line in seconds.

There’s more! The philosophy of running in containers, along with twelve-factor thinking, calls for a failure tolerant design. Your stateless app needs to track its work backlog using a remote, clustered service (or not have a work backlog — that’s a valid approach too). If you do have long-running tasks then implement a “dying gasp” behaviour that stops progress and explicitly returns part-processed jobs to the queue.

Recursive restartability (perhaps provided by your container control plane?) lets you deliver value by combining narrowly-focused components. The narrower the focus, the easier it is to make it disposable: look at function-as-a-service offerings such as AWS Lambda for a bunch of real-world examples.

10. Dev/prod parity

You should keep development, staging, and production as similar as possible.

If you have a staging environment, definitely make sure the way you deploy to it matches production. Don’t have a Dockerfile that’s special to each environment; instead, build artefacts once and promote them during release.

What about the difference between live and development? In a container world, that means deploying for local testing in a container. My own take on this is that you should write tests you can run locally. That’s pretty much the top original use case for Docker, so you can definitely do this in a container and cloud-native world.

The 12 factor methodology additionally recommends deploying changes as soon as they’re ready, and cautions against using different backing services between environments (eg: SQLite locally, but Postgresql in production). Previously this was easy to state but harder to implement whereas with public container registries you probably can find a container image for your favourite backend, ready to pull and run.

It’s less easy with services that started life in the cloud. Some, like AWS DynamoDB, are getting in on the act with a containerised version available for local development. Trouble is, even official alternatives aren’t always faithful to the original, and the real thing often costs real money (sometimes, quite a lot of real money).

In our experience, there’s no one good answer here. I think that’s why the 12 factor methodology says “as similar as possible” rather than “make them identical”. You have to understand the trade-offs and pick an approach.

On to:

11. Logs

You should treat logs as event streams. Yep. Good idea. Here is where I think the ecosystem around containers shows a rough edge or two. The problem isn’t that there’s no standard for logging from containers; it’s just how Andrew Tanenbaum put it back in the 1980s: “the nice thing about standards is that you have so many to choose from”.

Our menu includes:

Docker json-file output: each line is a separate JSON document, timestamped in a key called time.
Text: the application writes to standard output and standard error, and each line represents a new entry. The container runtime captures timestamps and other metadata. (This is what the 12 Factor methodology recommends).
fluentd: a popular de facto standard and often used with containers.
Graylog Extended Log Format, another popular standard.
OpenTelemetry: logging, metrics and tracing all in one.
Syslog: this is actually a family of related standards. In theory you can have structured metadata, message authenticity checks and more; in practice, it’s rare to have both the client and the server implement those features how you want them to.

Anyway, as soon as there were at least two options on the table it wasn’t looking great. With so many different viable approaches it’s a bit of a pain to hook things up. Once you’ve transported logs to a tool that can process them, the container / cloud-native story looks the same as the 12 Factor one; it’s just getting there that’s a pain.

The 12 Factor methodology doesn’t talk as much about other aspects of observability, such as measuring outcomes or counting failures. Logging aside, these look pretty similar in containers as not. I’ll move on.

12. Admin processes

The methodology wants you to run administrative tasks as one-off processes and specifies some aspects in particular:

run the administrative task in the same environment and from the same build
use the same dependency isolation technique
commit even single-use scripts into source control

Those details really narrow it down. The way I read this, if you’re using containers and want to follow 12 Factor, you use the same container image to run the regular app and to execute one off tasks. You might have multiple entrypoints, or one entrypoint with different subcommands.

In Kubernetes, this kind of work is the world of the Job rather than a long-running Deployment or StatefulSet. You can use the operator pattern to take care of maintenance tasks: running database migrations, backups, and more.

You don’t have to implement an operator; depending on your application, you might be able to move repair tasks and database schema setup into the main app (it helps if the backend supports idempotent schema changes).

Do you need to follow 12 Factor? Not if it’s not helpful. This item in particular is one where I think it makes sense to do what feels right – so long as you have a clear story about why it’s a good fit.

The exceptions

The 3rd factor in the list was to store configuration in the environment. If you need to put in a list of certificate authorities, you’ve got a few options; you can add that into the container image, you can provide a URL of a CA file and a signature, you can mount the CA list into the container at runtime.

You might or might not consider CA information as configuration; anyway, I thought I’d highlight it.

As I mentioned just above, if you go your own way on admin processes, (factor twelve) then that’s reasonable if you’ve got a good reason!

The overlap

You’ve seen how the 12 Factor methodology maps to a world of containers and their contents. Or maybe you skipped down and want to know the skinny.

I want to reiterate the point about isolating dependencies. Prune your containers to your app itself, its runtime if it needs one, and any direct requirements such as system libraries.

The ideal is that you can pass all the tests even if every other binary in the container is stripped away. The next best thing is to bundle other tools into the container–that’s a real strength of the technology–and do this explicitly. Pin the tool and its version in your source code, so that you can rebuild in a month or a year’s time and get the same behaviour.

Much of the good advice about taking 12 Factor principles and applying them to containers is applicable to other approaches that achieve the same thing. If you’re baking images for virtual machines, and the way you build and use those images is like a cloud-native deployment pipeline, you’re doing fine. As the saying goes: if it ain’t broken, don’t fix it.

The two different approaches mostly complement one another. That’s good, right? Whenever there’s two schools of thought that mostly conflict that feels to me like a sign that at least one of them isn’t up to scratch.

Advice like build, release, run or explicitly declare and isolate dependencies is important. Using containers gives you a great experience for building, testing and orchestrating the pieces of your system. It’s a flexible approach that suits almost any kind of software-as-a-service workload. I hope I’ve made it clear why I think the 12 Factor app methodology still has a place in our minds and our ways of working.

Want an easier route to running your workload in containers, with security, build automation, and scalability designed in? We offer a number of packaged application platforms that provide exactly that.

Tags:

Back to Blog

This blog is written exclusively by The Scale Factory team. We do not accept external contributions.