What are your users really doing within GitHub Actions?
One of the first questions to answer when building out GitHub Actions compute on premises is “how do I know what my users are doing?”
In an old-school persistent-machine setup, this isn’t a problem at all - for Actions or any other system. Install the company’s anti-virus program, endpoint protection stuff, logging stack, etc. like literally every other machine on the network and everyone is good to go. Likewise, if continuous integration is already a service, a lot of that risk is already handled for you by that SaaS provider. Once we combine putting these jobs into ephemeral containers and self-hosting this platform, the question gets a lot harder to answer.
I’m far more offended by curl -k | bash
than at any attempt of container escape. Disabling SSL verification is never the answer.
This is a difficult situation for an actions-runner-controller setup due to several compounding factors.
- The jobs are run in ephemeral containers … so they go away after each and every job. It prevents job cross-contamination and makes logs hard to gather after the fact. Even if you capture the pod logs, it isn’t a definitive source of everything that was done - only what was printed to
STDOUT
andSTDERR
from the pod. - GitHub Actions isn’t built for that kind of observability. It’s fundamentally a tool to run jobs and will print logs out that can be used for task-level debugging, etc. but not give you deep visibility into the infrastructure that job is run on or broad understanding of all jobs being run. Given the huge variety of stuff that an agent can be installed on, it’s an impossible problem to solve for within itself.
- Because this is (usually) a co-tenanted Kubernetes cluster - meaning that several teams within a company are sharing resources - keeping everyone in the boundaries of their cozy pod is important for the security and integrity of the entire build system.
- And privileged pods are very common running build jobs in Kubernetes for reasons covered previously, making it easier to escape and do silly things. 🤡
Some information from our infrastructure to create a complete picture of who’s doing what. In the cluster setup (part 1), we added a custom container network interface and installed Cilium and Hubble to start our journey on Kubernetes observability. Now we’re going to use those, plus Tetragon , to get a customizable look at what users are really doing inside of our runners. We can know things like
- Process starts, arguments, exits and exit codes
- File opens/closes, reads and writes
- Network connections established
- Capability escalations
- Syscalls made from within the container
- and so much more
Installing Tetragon
First, install Tetragon into your cluster. Continuing from our previous parts, it’s only the command below as we’ve already installed Cilium’s helm repository.
1
2
# Install tetragon
helm install tetragon cilium/tetragon -n kube-system --version 1.1.2
Install the tetra CLI
Tetragon will output raw JSON just fine - and if you already know this just needs to be shipped into your SIEM, there’s probably not much need for looking at things locally. To get the pretty stuff at the CLI, we need the local CLI utility. Go to the link below, download the latest release for your architecture, and install.
Next, if you’re on a Mac, you might need tell it to let you launch it.
1
xattr -dr com.apple.quarantine /usr/local/bin/tetra
Figure out what you want to know
I’m rolling with the defaults, plus privileged access and TCP network connectivity. There’s tons of other examples to use here - for the sake of simplicity and not too much in the logs for a proof of concept, I’m going to omit file access. I think trying to understand every read/write, open, and close of every file could get uniquely noisy in build jobs too, versus other uses of containers.
This configuration will tell me the following
- Which processes are run, with what arguments, and their exit codes
- Which network connections are established and where
- If any pods are executing in a privileged namespace
Let’s turn on privileged access by following these directions . Then, enable network logging with the following CRD.
1
2
3
4
5
# TCP network connectivity CRD
kubectl apply -f https://raw.githubusercontent.com/cilium/tetragon/main/examples/tracingpolicy/tcp-connect.yaml
# Open DNS requests CRD
kubectl apply -f https://raw.githubusercontent.com/cilium/tetragon/main/examples/tracingpolicy/open_dnsrequest.yaml
Dodgy users are up to no good
Start streaming the logs into stdout
and pipe them into the Tetra CLI for the runners
namespace.
1
kubectl logs -n kube-system -l app.kubernetes.io/name=tetragon -c export-stdout -f | tetra getevents -o compact --namespaces ghec-runners
Start the job we created in part 2 that creates an idle pod for an hour to do random fun stuff inside. Now, let’s exec
in and do some shifty shenanigans!
1
kubectl exec -i -t -n ghec-runners defaults-fsmdn-runner-47qck -c runner -- sh -c "clear; (bash || ash || sh)"
And here’s the output of some fun commands within the pod.
1
2
3
4
5
6
7
8
9
10
11
root@defaults-xh5cc-runner-8w4hb:/actions-runner# whoami
root
root@defaults-xh5cc-runner-8w4hb:/actions-runner# ls -la /
total 80
drwxr-xr-x 1 root root 4096 Feb 16 22:11 .
< ... lots more stuff got truncated ... >
drwxr-xr-x 1 root root 4096 Feb 8 13:00 var
root@defaults-xh5cc-runner-8w4hb:/actions-runner# mount /dev/sda2
mount: /dev/sda2: can't find in /etc/fstab.
And here’s what’s streamed to the logs immediately - our presence has been noted!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
🚀 process runners/defaults-xh5cc-runner-8w4hb /bin/sh -c "clear; (bash || ash || sh)"
🚀 process runners/defaults-xh5cc-runner-8w4hb /usr/bin/clear
💥 exit runners/defaults-xh5cc-runner-8w4hb /usr/bin/clear 0
🚀 process runners/defaults-xh5cc-runner-8w4hb /bin/bash
📤 sendmsg runners/defaults-xh5cc-runner-8w4hb /actions-runner/bin/Runner.Worker tcp 10.0.5.240:0 -> 13.107.42.16:443 bytes 35
🧹 close runners/defaults-xh5cc-runner-8w4hb /actions-runner/bin/Runner.Worker tcp 10.0.5.240:0 -> 13.107.42.16:443
🧹 close runners/defaults-xh5cc-runner-8w4hb /actions-runner/bin/Runner.Worker tcp 10.0.5.240:0 -> 13.107.42.16:443
🚀 process runners/defaults-xh5cc-runner-8w4hb /usr/bin/whoami
💥 exit runners/defaults-xh5cc-runner-8w4hb /usr/bin/whoami 0
📤 sendmsg runners/defaults-xh5cc-runner-8w4hb /actions-runner/bin/Runner.Listener tcp 10.0.5.240:62695 -> 13.107.42.16:443 bytes 2370
📤 sendmsg runners/defaults-xh5cc-runner-8w4hb /actions-runner/bin/Runner.Listener tcp 10.0.5.240:1256 -> 13.107.42.16:443 bytes 2533
🚀 process runners/defaults-xh5cc-runner-8w4hb /bin/ls -la /
💥 exit runners/defaults-xh5cc-runner-8w4hb /bin/ls -la / 0
📤 sendmsg runners/defaults-xh5cc-runner-8w4hb /actions-runner/bin/Runner.Worker tcp 10.0.5.240:0 -> 13.107.42.16:443 bytes 35
🧹 close runners/defaults-xh5cc-runner-8w4hb /actions-runner/bin/Runner.Listener tcp 10.0.5.240:0 -> 13.107.42.16:443
🚀 process runners/defaults-xh5cc-runner-8w4hb /bin/mount /dev/sda2
💥 exit runners/defaults-xh5cc-runner-8w4hb /bin/mount /dev/sda2 1
The full log, if you’re interested, is here.
Next
Automating our runner deployments to make development easier - part 4