June 1st 2020
experienced system architect, tech leader, tech researcher
This article describes one of the latest trends in the container world – it’s called distroless containers.
Containers are the most preferred method of deploying business applications in all business sectors, so the optimization of how companies use containers has a direct impact on their critical applications infrastructures.
Why reduce the docker image containers’ size? Computers are faster and have more resources so maybe it’s not that important. The answer is simple: for performance and security. If you need more, here’s another reason – money.
Docker images are copied, transmitted, and launched by container fleet managers. It all takes time, time spent on a disk I/O and a networking I/O.
And there are min spec machines which are cheap in the cloud, so fitting more containers into one machine means less machine spawns.
Attack surface can be explained in a few words: whatever your docker image has can be attacked, and the more it has the more likely it will be attacked. It’s that simple. Usually, Linux distribution based docker images contain tons of stuff you won’t ever need, but hackers can use it to hack into your system.
Remember your favorite cloud provider and its offer. Let’s imagine you want to pay only for a low spec machine with just 1 GB of RAM. If your image size is 500 MB then you can fit two, but if you can reduce it to 100 MB then you can fit ten containers. It really matters! Even if you are not an internet behemoth like Google or Netflix, and even if you have relatively small applications with much fewer users, the costs accumulate over time.
What is a distroless image?
A distroless image is a slimmed down Linux distribution image plus the application runtime, resulting in the minimum set of binary dependencies required for the application to run.
Distroless images are based on Linux distributions; distroless is not a bare image (without anything). For instance, current (March 2020) Google images are based on Debian, but minimized slimmed down Debian distributions.
Containers are not virtual machines, you don’t need huge binaries for the OS to run your application. You don’t need ls or grep find cat or even bash in your container to run the java/go/node app. If you are used to logging into your container images and playing with bash as a root, you’re out of luck in general (highly not recommended), but with distroless it’s simply impossible.
Do you really need grep, ls, or bash in your production container image?
Booo, there’s no Linux shell here! It just runs Java apps on top of a slimmed down Debian instance.
A typical container consists of:
- Distro base layer – linux distribution files (Ubuntu, CentOS, Debian)
- Runtime layer (JRE for Java, Python runtime, glibc for C++)
- Application layer – actual application binaries
An even more optimized container looks like this (runtime optimization is a subject of another article but it’s worth noting you can also shrink Java JRE or Python runtime):
Compare the image sizes for a second.
It’s a recommended practice, and even more a simply obligatory practice (compliance etc.), to scan the containers for known vulnerabilities using scanners.
A better signal to noise for container security scanners is one of the important reasons for distroless containers, also less files to scan and less I/O and CPU consumption for scanning as well. What does this really mean?
Usually, the default distributions come with tens of security warnings and issues, which are often ignored, which is a risky strategy because among them (noise) can be a few significant security issues that can jeopardize your application security.
With cleaner container images, which generate close to zero warnings and errors during the scanning process, you can clearly see things that matter, so the signal to noise ratio is much higher.
Let’s scan our images using a clair-scanner.
docker run -d --name db arminc/clair-db
docker run -p 6060:6060 --link db:postgres -d --name clair arminc/clair-local-scan
The official openjdk image and distroless image have zero detected vulnerabilities, other images have multiple detected vulnerabilities.
Do it yourself
Distroless images are here. Google, which seemed to start this movement, is publishing their own distroless images for Java, Python, Go and C++.
Note that you do not (usually) have the package manager in a distroless base image (which can be shocking at first). Again no apt or pip, you have to install the dependencies in another way. An alternative way is to use a multi stage build, a feature from Docker which will allow us the effects (files) created by a package manager for our target of a distroless based image.
So we can have both, the temporary package manager in a temporary docker image and the flattened file system resulting in a smaller image.
Like everything you do yourself, the question of maintenance arises. Who, when, and how will it update your local distroless image? This has to be addressed, otherwise, without proper maintenance and base dependencies updates, it will rust very quickly and defeat the purpose of its creation.
Alpine Linux is a very tiny linux distribution which is only 4 MB in size.
Alpine musl libc and busybox lack GNU glibc support and lack of GNU glibc support means trouble.
For instance, JDK doesn’t run on musl libc, there is a port called Portola, but it’s certainly not as well tested and reliable as official JDK distributions. It’s hard to imagine running critical business applications on protype. Of course there is news that the new version is working or soon will be working.
Alpine meant for instance, that Python containers had to download all the dependencies and as a consequence they have become bigger with the Alpine base image than without it (sic!).
Additionally, Alpine Linux image contains busybox tools, apk (package manager) and other binaries you simply don’t need to run your application.
Alpine with glibc? Yes, there are containers based on Alpine plus glibc libraries. Still it’s not as easy as Alpine fans would like it to be.
As you could see in the example above, there are readily available ‘slim’ versions of the official images for any given language runtime, which are well worth investigating. However, they often do have a shell and command tools that are not good for container security.
Sometimes they can contain excess binaries you don’t need, and they don’t have the ones you do need which can break your runtime dependencies.
Minimize image size using tools
Another approach is to use a standard base image (not minimized) and then use automatic tools to detect dependencies and remove files that are not needed. For example a minicon analysis existing in containers gives some hints, for instance the RUN commands and other commands that will be required, and based on that, hints are able to significantly reduce the image size of the container.
The risk however is that automated dependency findings may fail and result in a runtime container unable to run your application. I endured tiresome trial and error efforts in order to fix runtime errors (file not found, dependent library missing etc.) by adding missing libraries, configuration files and environment settings.
Based on the research, I’ve identified the following typical answers to the problem of container size and attack surface optimization:
- I don’t need it – I don’t want to risk the stability for some MBs of storage space, it’s not worth it, I’ll use the official full images which I trust.
- I use Alpine and / or :slim images – it’s more adventurous, but when viable I just pick Alpine or a slim image as my base; let the others spend time on optimization.
- I use distroless with standard runtime image – I like those Google distroless images and use them on a regular basis as an entry point for my Java/Python/Go containers.
- I use distroless with an optimized runtime image – I am a container optimization pro, the effort is there, but it’s worth it.
In which camp are you? What is the trend in your opinion?
Of course it depends on the specific problem, project, etc. But there’s statistics and practice. Please identify your camp and explain why.
Our experts working for the complex solutions in pharma, insurance, banking and industrial sectors share their views.
Felix Hassert, Avenga, Director of Products & Hosting
In the Go(lang) environment distroless images have been around for quite a while. Go usually compiles ‘almost statically’ linked fat binaries. This makes it especially easy to use distroless containers.
image; the produced binary is copied into a scratch image afterwards.
The `FROM SCRATCH` directive is not only distroless, it’s literally empty. Less of a footprint in terms of size and the attack surface isn’t even possible.
In times where everyone pushes their branches early-and-often and CI/CD is in place, faster downloads and less space used in the Docker Registry is a plus on its own. But more important are the positive effects of enforced discipline:
If nothing is in the base image, you have to bring everything needed deliberately. There are no hidden dependencies. Your software won’t break if lib A forces you to upgrade the base image that could come with an incompatible version of lib B.
It is much easier to follow security best practices too. Without files, there is no hassle with file permissions. Just run your entrypoint with an unprivileged user id and make the root folder unwriteable. This will avoid a whole class of file system related errors. You simply cannot write to a disk without a Docker volume. There is no temptation to write logs anywhere else than to stdout/stderr.
However, even for the Go software it can be tricky to run in a scratch container. For example, if you need to process HTTPS traffic, you need to bring the CA certificates yourself. Debugging can also be tricky, because you don’t have a shell to execute, but Docker allows you to start a full fledged debugging container (with shell, strace and other handy tools) and attach that to the process space of the specimen container
docker run -it --rm --pid=container:<go-container> --cap-add SYS_PTRACE alpine sh
That said, the distroless trend is absolutely the right direction.
Vladyslav Litovka, DevOps expert
On recent projects we’ve moved to using multistage builds and minified containers; mostly Alpine based but also distorless. In some cases it was forced by customer requirements (like Andriy said), and in some cases it was our decision. Anyway, project wide I can say that in most cases it’s option 1, in some option 2 and 3.
Great article. Good and simple explanations. It contains the exact amount of technical details which allows us to understand the main purpose and implementation direction in just 15 minutes of a reader’s time.
Andrew Petryk, Java Engineering Manager
Overall, it seems to be a trend – everything is becoming smaller. I remember the days when deploying .war to a Java Web Application Server was a several minute task where now micro-frameworks fight for sub-second startup time. We have made quite good progress there. The same goes for docker images, but the fight is for MBs rather than seconds.
I can’t say that I am solidly in one of the camps Jacek mentioned above (except for camp 1 🙂 ) because, as it usually happens in IT, it depends on many factors. But for sure, distroless images are a thing and you definitely should give it a try.
Containers are everywhere and knowing how to use them efficiently is a key skill for every tech organization. We are happy to share a distroless intro and our ‘experts’ opinions on the subject.
If you are new to the topic, I hope you see now that distroless is worth trying, if for some time you have been a distroless fan already, you are definitely not alone.