Build multi-architecture runners for actions-runner-controller
When I’d started out building images for actions-runner-controller , I’d only built them for amd64
. This was fine until I got a new job working with folks on ARM instead. I still use this project to simulate other container workloads and demonstrate different parts of the software supply chain. This seems like a simple valuable addition.
The end goal is to have a single GitHub Actions workflow that will regularly build runner images using multiple CPU architectures. The finished workflow , for the impatient. 😇
I’ve also been spending less time actively maintaining this project, so anything to make it more automated to simply continue to use somewhat safely. The priorities here are minimizing upkeep time and staying recent enough to not be horrifyingly insecure.
Regular builds
The first step is to build the images regularly. This automatically happens once a week via cron
scheduling.
1
2
3
4
5
6
name: 🍜 Build/publish all runners
on:
workflow_dispatch: # build on demand
schedule:
- cron: "43 6 * * 0" # build every Sunday at 6:43 AM UTC
I usually prefer using semver for versioning my projects, but found it to be a poor fit for this use case. These images can be stored indefinitely for retention purposes, but they’re not meant to be used for long periods of time either. The build pipelines and such around these images doesn’t change substantially, but the images do get rebuilt often to stay up-to-date.
This uses both the short SHA of the commit and latest
as the tag instead. It’s also not unreasonable to use the date of the build as an image tag in this case.
Set minimal CI permissions
Next, set the permissions for the just-in-time token to do everything we’re asking it to do here and no more. Each permission has a comment to justify itself.
1
2
3
4
5
6
7
8
9
10
11
12
jobs:
build-all:
runs-on: ubuntu-latest # use the GitHub-hosted runner to build the image
permissions:
contents: write # for uploading the SBOM to the release
packages: write # for uploading the finished container
security-events: write # for github/codeql-action/upload-sarif to upload SARIF results
id-token: write # to complete the identity challenge with sigstore/fulcio when running outside of PRs
strategy:
matrix:
os: [rootless-ubuntu-jammy, rootless-ubuntu-numbat, ubi8, ubi9, wolfi]
continue-on-error: true
Each operating system is built in parallel using the matrix
strategy. There are currently five operating systems in the matrix, with their names corresponding to the Dockerfile
in the ~/images
directory. It’s easy to change when needed.
Boring setup
The initial setup to building is delightfully boring. In order, we’re:
- Checking out the repository
- Setting the version of the image to the tag of the release (if it’s a release). This is always skipped in this workflow because it doesn’t run on release, but I’ve left it for reference.
- Setting the short SHA of the commit for a short, (usually) unique identifier.
- Logging into the GitHub Container Registry. Swap in whatever registry the final image gets pushed to when adapted for your own use.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
steps:
- name: Checkout the repo
uses: actions/checkout@v4
- name: Set version
run: echo "VERSION=$(cat ${GITHUB_EVENT_PATH} | jq -r '.release.tag_name')" >> $GITHUB_ENV
if: github.event_name == 'release'
- name: Set short SHA
run: echo "SHA_SHORT=${GITHUB_SHA::7}" >> $GITHUB_ENV
- name: Login to GitHub Container Registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
Multiple CPU architectures
First, one design decision needs to be made. Can the compute you’re using to build the runner images build for multiple architectures?
If it can, you can use QEMU to build for multiple architectures. I’m doing this because this is an open-source project and I’m using free compute from GitHub Actions to build these containers. This can come at a performance penalty - emulation takes a lot longer to do.
If your compute can’t do this or build time is important, you’ll need to build natively for each architecture. I would recommend using a separate job for each architecture to account both for a different runner type and to handle any platform-specific quirks without clever YAML templating - e.g., build-all
(producing 10 images, 5 runs with 2 images each) becomes build-aarch64
and build-amd64
(5 runs for each job, making 1 image each run). You’ll then be able to bypass this step.
flowchart LR
A(fab:fa-git-alt Dockerfile for runner,<br>dispatched to build)
A --> B
B{Can the compute build for multiple architectures?}
C(Set up QEMU,<br>one run per Dockerfile)
D(Build natively,<br>one run per image)
B -->|yes| C
B -->|no| D
In the YAML workflow, define the platforms. In our case, it’s linux/amd64
and linux/arm64
. Literally any other QEMU-supported platform can be added here.
1
2
3
4
5
6
7
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
with:
platforms: linux/amd64,linux/arm64
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
Here’s where you can get fancy with tagging. I’ve chosen to be the least surprising. This leaves architecture to be (mostly) invisible to folks, so latest
is the same set of software from the same build but on any number of CPU architectures. In this case latest
can be two different images produced from the same codebase.
1
2
3
4
5
6
7
8
9
- name: Set Docker metadata
id: docker_meta
uses: docker/metadata-action@v5
with:
images: ghcr.io/some-natalie/kubernoodles/${{ matrix.os }}
tags: |
type=sha,format=long
type=raw,value=${{ env.SHA_SHORT }}
type=raw,value=latest
As shown above, one could store the CPU architecture or any number of other data points in the OCI image tag. Just keep in mind that there are limits to tag length that vary based on things you cannot control. Common oversights are the length of the FQDN of your internal registry and implementation quirks of clients used to interact with the image.
Lastly, actually building and pushing the containers is easy. Just note that the platforms here must match the ones set up earlier.
1
2
3
4
5
6
7
8
- name: Build and push the containers
uses: docker/build-push-action@v6
id: build-and-push
with:
file: ./images/${{ matrix.os }}.Dockerfile
push: true
platforms: linux/amd64,linux/arm64
tags: ${{ steps.docker_meta.outputs.tags }}
Scan it
Next up, scan the image and upload the report. This step is a bit specific to using Anchore Grype as a scanner and uploading the results to GitHub. This is free for open-source projects, but obviously swap in whatever it is you’re really using.
1
2
3
4
5
6
7
8
9
10
11
12
13
- name: Scan it
uses: anchore/scan-action@v6
id: scan
with:
image: "ghcr.io/${{ github.repository }}/${{ matrix.os }}:${{ env.SHA_SHORT }}"
fail-build: false # don't fail, just report
- name: Upload the container scan report
id: upload
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: ${{ steps.scan.outputs.sarif }}
wait-for-processing: true # wait for the report to be processed
Make an SBOM
Lastly, generate a Software Bill of Materials (SBOM) for the image. This is a list of all the software components in the image, which can be used for compliance, security, and other purposes.
1
2
3
4
5
- name: Generate that SBOM
uses: anchore/sbom-action@v0
with:
image: "ghcr.io/${{ github.repository }}/${{ matrix.os }}:${{ env.SHA_SHORT }}"
artifact-name: "${{ matrix.os }}-${{ matrix.arch }}-${{ env.SHA_SHORT }}.sbom"
In this case, we’re only uploading it to the workflow artifacts. Most folks want to put that any number of other places instead, so adjust as needed.
Room for improvement and next steps
Some small places to improve this before production would be to pin all of the Actions above to their immutable SHA. Right now, using the tags (or worse, main
branch status whatever that may be) means what’s in use at each point in time is mutable. I added this to have some native compute for local builds and provide reasonable workloads to test other integrations.