Searching for secrets in container images
🙈 Yet another place to look for secrets? It’s common to find as teams move their workloads into containers and navigate the security challenges that come from it. Yet it’s also puzzling … how does a container scanner find an API key that isn’t in the finished image? Let’s create a container with a “secret”, then go over a few ways to extract them from a container. 🕵🏻♀️
A finished container looks much like this:
a container with multiple layers, and a secret used in build hidden in one layer
What you see when you shell into a running container (or mount it as a file system on your laptop) is only that top “finished” layer. As we discussed earlier when tidying our large container images, every single step of the container build becomes a layer that’s downloaded as one image. It’s why being mindful of how we’re using those layers is so impactful to the finished size. It also makes it easy to assume you’re safer than you really are when using long-lived credentials.
Let’s hide something!
If you’re not interested in making your own image with a hidden secret, jump to looking through image layers. Pull the image at
ghcr.io/some-natalie/some-natalie/secret-example:latest
to follow along on the next sections.
First let’s make a file that has a “secret” in it.
1
2
3
4
5
this is super secret serious business
actually, it's probably a boring api key
with root access, naturally
Next, make a container file that puts the file in, then removes it in the next layer.
1
2
3
4
5
FROM cgr.dev/chainguard/wolfi-base:latest
COPY secret.txt /not-a-secret.txt
RUN rm -rf /not-a-secret.txt
Build it, then let’s start to pick it apart.
Launch it and open a terminal inside of that process to verify that the file isn’t easily visible.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
~ ᐅ docker run --rm -it ghcr.io/some-natalie/some-natalie/secret-example:latest
8be11d233bda:/# ls -lah /
total 12K
drwxr-xr-x 1 root root 20 Apr 16 01:49 .
drwxr-xr-x 1 root root 20 Apr 16 01:49 ..
-rwxr-xr-x 1 root root 0 Apr 16 01:49 .dockerenv
lrwxrwxrwx 1 root root 7 Mar 31 13:08 bin -> usr/bin
drwxr-xr-x 5 root root 340 Apr 16 01:49 dev
drwxr-xr-x 1 root root 12 Apr 16 01:49 etc
drwxr-xr-x 1 root root 14 Jan 1 1970 home
drwxr-xr-x 1 root root 564 Jan 1 1970 lib
lrwxrwxrwx 1 root root 3 Mar 31 13:08 lib64 -> lib
drwxr-xr-x 1 root root 0 Jan 1 1970 opt
dr-xr-xr-x 278 root root 0 Apr 16 01:49 proc
drwx------ 1 root root 24 Apr 16 01:49 root
drwxr-xr-x 1 root root 0 Jan 1 1970 run
lrwxrwxrwx 1 root root 7 Mar 31 13:08 sbin -> usr/bin
dr-xr-xr-x 11 root root 0 Apr 14 03:34 sys
drwxrwxrwt 1 root root 0 Jan 1 1970 tmp
drwxr-xr-x 1 root root 10 Jan 1 1970 usr
drwxr-xr-x 1 root root 56 Jan 1 1970 var
Notice how that
/not-a-secret.txt
file isn’t present? That doesn’t mean this alert is a false alarm. Don’t dismiss this finding.
Look through image layers
There are a few tools that make visualizing changes at each layer simple. My favorite, as a text user interface, is dive
(GitHub ). This lets you explore what each layer’s build step was and the changes it made, extract files, and more.
dive, continuing to be awesome
But … let’s do this the manual way to really know what we’re doing here. Then, we can understand what a scanner is doing to return “found secrets” in an image.
Extracting secrets from images
Remember how OverlayFS , the foundation for a container’s filesystem, works? Containers are (basically) some JSON and file system layers wrapped up in a tarball. Crack open that tarball and explore!
1
2
3
4
5
6
7
# convert the image into a tarball
docker save ghcr.io/some-natalie/some-natalie/secret-example:latest -o secrets.tar
# open up the tarball
mkdir secret-example
tar xf secrets.tar --directory=secret-example
cd secret-example
Taking a look at the directory structure, it’s not human-friendly. The secret is hidden somewhere here though.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
.
├── blobs
│ └── sha256
│ ├── 44276f079bfa2a573e96b47f2de9b619ba3af8296f2e56c8a3179aa366b620a8
│ ├── 5d9ca308799efeead16b21fa95dc3ee54d34eb6e91fc1bd5a7ba7f2f68cd52f9
│ ├── 849b2beeb8b817251ac75a4bf897ef4ff4413faac65417b9d20a502aebebaf4e
│ ├── bcfdc337a6da7ceaae240acda0d8c51d3aa796e80f1bd2f9f34c1f0c5a9f32d3
│ ├── c17fffd7182302ee1e4bbf0511d749bcd423ea45c374243577795b5a1baae8d6
│ ├── d688176b49d54a555e2baf2564f4d3bb589aa34666372bf3d00890a244004d02
│ ├── ed5970c83cd47c1004fafa5577356a1e41eb0347d4408d77f5cf01762f810d66
│ └── f8d9d69c53752e0ac2a555f782a6155f6ef77ca3a1911408f51280d8a3cc6bae
├── index.json
├── manifest.json
├── oci-layout
└── repositories
3 directories, 12 files
🗺️ To help make sense of it, let’s look at the manifest.json
file. It’s a map of what all is in this archive. This is a simple image, with three layers and no extra metadata attached to it. The file has been truncated, but we now have the list of layers to look through.1
1
2
3
4
5
6
7
8
9
10
11
[
{
"Config": "blobs/sha256/d688176b49d54a555e2baf2564f4d3bb589aa34666372bf3d00890a244004d02",
"RepoTags": ["ghcr.io/some-natalie/some-natalie/secret-example:latest"],
"Layers": [
"blobs/sha256/bcfdc337a6da7ceaae240acda0d8c51d3aa796e80f1bd2f9f34c1f0c5a9f32d3",
"blobs/sha256/f8d9d69c53752e0ac2a555f782a6155f6ef77ca3a1911408f51280d8a3cc6bae",
"blobs/sha256/849b2beeb8b817251ac75a4bf897ef4ff4413faac65417b9d20a502aebebaf4e"
]
}
]
Looking at each layer now
A small trick - each of those layers are also a tarball, just without the file extension. Now that we know which files contain layer data, it’s a simple task to untar them, then grep
through the contents for the string or regex that we’re looking for. The contents of the plain text files copied in are in the layers … in plain text … secrets and all.
1
2
3
4
5
this is super secret serious business
actually, it's probably a boring api key
with root access, naturally
A container scanner or “secrets finder” works in a similar way to what we just did manually. It searches for many patterns of known regular expressions or high-entropy (read: very random) strings in images.
Even if our “end process” can’t see that credential, anyone that can run the image must also pull it - meaning they can see all the files that went into the build. This often means credentials or API tokens used in the build, but it’s a common exfiltration path for proprietary source code too.
Parting thoughts
Exfiltration of sensitive information and credentials in containers is an easy step to overlook in software distribution. Scanners may help find things, but they will not block them. Prevent this risk by
- Not using long-lived credentials like API keys or passwords or deploy keys.
- Not putting them in your build … ever.
- Deleting them during the build, then squashing your final image to ship a single-layer image.
- Some builders have a concept of build secrets , allowing the builder access without storing it in the image.
- If you’re worried about source code leaking, use a multi-stage build. Build in one stage, then copy the finished artifacts over to another one. This also reduces the finished image size.
basically the security of every long-lived credential
Footnotes
-
You can also get the list of layers directly from the docker cli using a little bit of
jq
wizardry with this one-liner:docker image inspect ghcr.io/some-natalie/some-natalie/secret-example:latest | jq '.[].GraphDriver.Data.UpperDir + ":" + .[].GraphDriver.Data.LowerDir | split(":") | reverse'
↩