If you’re familiar with Dockerfiles, you probably know that they consist of a set of instructions. Each instruction results in a new docker layer. You may use that fact to optimise your deployment times, as well as storage and bandwidth costs, but it requires some strategic planning. Let’s explore how.
In order to make this plan, one needs to understand what a Docker layer is. You may already know that it’s a basic unit of a container image. Internally, you may think of a container image layer as an image itself with an automatically generated ID - instead of a tag assigned by you.
Having separate layers is supposed to help you with optimising image build time, get faster downloads when you have part of an image cached, and help create faster feedback loops. It’s because Docker uses a cache for each layer so it can reuse them for future builds, but only under the assumption that previous layers remain unchanged. Changing a previous layer invalidates the cache and results in recreation of subsequent layers from scratch.
And here’s when careful planning comes into play. Firstly, you want to place instructions which change less frequently before others. It can mean that you are reusing another base image or taking advantage of multi-stage builds. Another tip for maximising cache benefits is to reduce the number of layers. For example, you can combine multiple instructions into one by concatenating them.
All these options come with their own disadvantages which can include reducing the readability of your Dockerfile. This may result in a poor developer experience or waste your time when trying to understand what the author originally had in mind. Instead of hitting these kinds of problems you may consider using docker-squash.
docker-squash: how to?
If you’re familiar with git’s squash concept, then you may already have an idea what
docker-squash is about. If not, think about combining multiple image layers into one. Firstly, let’s see that in action taking into account the following Dockerfile:
FROM ubuntu:18.04 AS compile-image RUN apt-get update RUN apt-get -qq -y install curl RUN apt-get install -y gcc build-essential WORKDIR /root COPY hello.c . RUN gcc -o helloworld hello.c RUN apt-get remove -y gcc build-essential RUN apt-get -y autoremove
Imagine that you build an image using
docker build -t hello:latest and then check its size by running
docker images - I saw 324 MiB. Then it’s possible to check what layers it consists of (
docker history hello:latest). That command will also display the auto-generated ID for each layer, if recorded.
Now we have some options available to us: leave it as-is; squash all the layers; squash layers down to the selected layer using the layer’s ID, or squash by specifying how many layers we want to squash. This choice should be always made on a case-by-case basis depending on the image structure. You’ll read more about it below, where I will show how to squash down to one layer, and then how to squash everything but the last few layers.
The simple option is to squash everything: install docker-squash and run:
docker-squash -t hello:squashed hello:latest.
I’ve also specified the target image and tag using
-t, and the source image plus its tag goes last on the command line. What I get is a new tag (
hello:squashed) with just one layer.
I tried this with a simple build and my target image’s size was reduced by 62.1%. That 324 MiB has become 120 MiB.
What if I want to keep the last 3 layers? Knowing that my image has 11 layers, I could run
docker-squash -f 8 -t hello:squashed hello:latest, specifying how many layers I want to squash. Also, if I know the layer ID of layer 8, I could specify that instead.
docker-squash: when to?
Remember that by using
docker-squash you may get rid of caching, as well as parallelisation advantages when downloading a single layer. That’s why tailoring your solution should be done carefully. Below I’ll present some use cases where
docker-squash may be particularly useful.
1. Temporary files
Sometimes you need to download some temporary files in one Docker layer just to remove them in a subsequent one. In such a case, they’ll still contribute to your Docker image size, as the fact that you deleted it in a specific layer doesn’t equal removing them from the previous ones. Once you merge these layers, only the diff from merged underlying instructions are preserved, and you can optimise your image size. This also refers to multiple layers modifying the same files - squashing will result in extracting only the delta of all merged layers.
2. Keeping things safe
Ideally, there aren’t any secrets written to your layers. Nevertheless, it’s good to know that if you want to avoid the possibility of retrieving some files from a specific layer, removing them in a separate instruction won’t be enough. That’s a perfect reason to use
3. Partial squashing
4. CI/CD pipelines
Preferably your images are built by CI/CD pipeline, where the Docker image cache starts from scratch. In such a case, you don’t have to worry much about the caching behaviour, and just optimise an image for the size.
5. Readable Dockerfiles
As mentioned before, it’s recommended to use as few layers as possible. A common technique is to concatenate
RUN commands in a Dockerfile, cutting image size but also decreasing the readability. This may increase the barrier of entry for new hires significantly, and also negatively affect the developers’ experience. Instead you may still separate your Dockerfile instructions for the sake of simplicity, and squash layers after the image build to reduce the time taken for image pulls and container launches.
docker-squash is a useful thing to have in your toolset to make your container images lighter and deployment times faster, while not trading away readability for a Dockerfile. As shown above, usage of the tool seems simple, but the value is really about when to use it to gain the maximum benefit out of it.
Want an easier route to running your workload in containers, with security, build automation, and scalability designed in? We offer a packaged container platform blueprint that provides exactly that.
This blog is written exclusively by The Scale Factory team. We do not accept external contributions.