Post

Inspecting pull requests to understand behavior changes

Inspecting pull requests to understand behavior changes

confession-bear

I have a very bad habit to confess.

😇 I pin my third-party GitHub Actions to a SHA.1 😇

This ensures that a specific commit is running every time. It’s widely considered a security best practice (docs ). It is also one of the easiest steps to take in securing your CI system should you choose to use third-party reusable Actions (or literally any open source software built on it, which is a lot.)

😈 But I also blindly accept Dependabot pull requests without reviewing them. 😈

This completely negates the benefit of pinning to an immutable version to prevent software supply chain attacks in CI. Let’s stop doing that, shall we? 🙈

Let’s inspect the automated pull requests that Dependabot makes for updating third-party GitHub Actions and all of the transitive dependencies that changed with it. Finished workflow for the impatient.

We’ll use an open source, air-gap capable YARA-based scanner that accepts many types of inputs with flexible outputs. It can run as a check before merging code, a promotion check to move into an offline system, or many other places during software development.

Using malcontent

yara-rules

To do this, we’re going to add a workflow using malcontent to check our changes to GitHub Actions. There is a lot of hidden complexity in the GitHub Actions ecosystem. Some assembly is required to do this. Let’s start with using it and work up to having it review more complex code change systems.

Take a minute to read the documentation on the project’s repo . There are a ton of nifty features I won’t be needing for this walkthrough.

A few important things to note:

  • You can use it in a container or install it natively. We’ll be using it in a container here, since that provides portability for it and all dependencies needed.
  • It can do a quick scan or deeper analyze of an arbitrary set of files. It can also diff capabilities between two directories or binaries. Pull requests lend themselves naturally to differential analysis.
  • There’s a couple output formats, but none are standardized (eg, SARIF or similar) for drop-in ingest into other tools. The JSON output is handy for machine parsing. Markdown is always pretty.

Some assembly required

From here, we’re going to

  1. Get a basic scan or two going locally - jump to section
  2. Move that basic scan into GitHub Actions - jump to section
  3. Run a differential analysis on a pull request in GitHub Actions as a check - jump to section
  4. Combine it to analyze differences in a GitHub Actions version bump from Dependabot - jump to section

This will hopefully prevent some unfortunate recent attacks such as tj-actions and reviewdog from earlier in 2025. This sort of attack is inherent to using third-party code in your build systems, as I’d learned during the Codecov attack of 2021.

1. Some basic scans

Let’s start locally and use the pre-packaged Docker container. This includes all dependencies and tens of thousands of YARA rules in the image, making it highly portable for air-gapped usage. It’s free to use and updated at least daily to save me from a ton of setup and update work too.

Scan a container image or folder

1
2
3
4
docker run --rm -v .:/tmp \
  cgr.dev/chainguard/malcontent:latest \
  --format=markdown --output=/tmp/container-scan.md \
  scan -i ghcr.io/some-natalie/jekyll-in-a-can:latest

Many GitHub Actions are shipped as containers, so we’ll use this capability. A few interesting things to note:

  • It doesn’t need the --privileged flag because it isn’t running the container or any container runtime. Under the hood, it’s using crane to interact with the container image as a tarball of filesystem snapshots.
  • The -i flag specifies it as an image and not a file.
  • In order to write the output to a file, it needs a volume mount. I did the lazy thing and used my current working directory as /tmp in the container. Obviously edit this if we’re not in an ephemeral environment (or playing around locally).
  • Full results of above command.

Analyze a file

1
2
3
4
docker run --rm -v .:/tmp \
  cgr.dev/chainguard/malcontent:latest \
  --format=json --output=/tmp/malcontent.json \
  analyze /tmp/nc

This does a deeper look into any particular file and enumerates what it can do. The full results of this command are in JSON in this case. It’s handy for

  • Looking at compiled artifacts without access to source code, such as libraries you pull in during a build. 🕵🏻‍♀️
  • Using GitHub Actions or other compute-as-a-service platform to analyze the contents of a remote file store. Since those are ephemeral VMs, the security of this system is someone else’s problem. 🙊
  • Inspecting those MP3 files you definitely didn’t torrent in college. 😉 🎧

Diff the capabilities of two files or folders

1
2
3
4
docker run --rm -v .:/tmp \
  cgr.dev/chainguard/malcontent:latest \
  --format=json --output=/tmp/malcontent.json \
  diff /tmp/nc /tmp/libffmpeg.dirty.dylib

This is the foundation we’re going to build our pull request work flow from. It only looks at the differences from the first file to the second, meaning we can highlight only what’s changed. The full results from this command are available for further use, but we’ll use the Markdown output to comment on the real PR in GitHub.

2. Now in a CI pipeline

For our first workflow, let’s inspect an arbitrary file in a repository or remote storage bucket. For simplicity, use the built-in run summary system rather than uploading the report elsewhere. (full analyze workflow file)

1
2
3
4
5
6
7
# stuff above here, like cloning or copying or mounting a remote file share
      - name: Analyze the files
        id: malcontent
        run: |
          docker run --rm -v ${{ github.workspace }}:/tmp \
            cgr.dev/chainguard/malcontent:latest \
            --format=markdown analyze /tmp >> $GITHUB_STEP_SUMMARY

simple-scan-dark simple-scan-light a very simple scan of files in GitHub Actions

3. Viewing diffs at pull request

To use differential analysis, remember a few things:

  1. First, build the original code with all the old dependencies (or pull the finished artifacts in).
  2. Then, build the proposed changes with the changed dependencies.
  3. Build both of these in separate directories.
  4. Create the diff between each finished product to analyze.
  5. Upload that report somewhere that it will be seen in code review and/or fail the build based on a threshold.

Here’s the full diff workflow example. It creates some lovely pull request comments, as shown below.

pr-diff-dark pr-diff-light if only every naughty PR was so straightforward as to simply say “add malware”

4. Looking at GitHub Actions PRs

Now let’s add in how Dependabot works, some nuances of the GitHub Actions ecosystem, and our scanning knowledge from above to inspect PRs changing our Actions. (full example file )

Note this does not work for grouped Dependabot PRs. One PR per changed dependency only because we’re looking at each changed dependency, not testing our project that uses them.

Setup

This workflow only needs to run on pull requests changing the workflow files that target the default branch (main in this case). However, it only needs to be able to read the content of the repo. Specifying all this in YAML looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
name: 🔍 Malcontent differential analysis 🔍
on:
  pull_request:
    branches: ["main"]
    types:
      - opened
      - synchronize
      - reopened
    paths:
      - ".github/workflows/**.yml"
      - ".github/workflows/**.yaml"
permissions:
  contents: read

From here, there are three jobs that must be done in order.

Extract information about the Action

First, we need to parse the pull request for what’s changed, then output those changes as job outputs we can use later on in the workflow. The diff we’re interacting with looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
diff --git a/.github/workflows/zizmor.yml b/.github/workflows/zizmor.yml
index 5bfbf9f..04ee17f 100644
--- a/.github/workflows/zizmor.yml
+++ b/.github/workflows/zizmor.yml
@@ -23,7 +23,7 @@ jobs:
           persist-credentials: false
 
       - name: Install the latest version of uv
-        uses: astral-sh/setup-uv@6b9c6063abd6010835644d4c2e1bef4cf5cd0fca # v6.0.1
+        uses: astral-sh/setup-uv@f0ec1fc3b38f5e7cd731bb6ce540c5af426746bb # v6.1.0
 
       - name: Run zizmor 🌈
         run: uvx zizmor --format=sarif . > results.sarif

However, we need to break this up into the old and new Action’s repo (owner/repo) and tag or SHA in use. That second one needs to be sanitized to remove the version number as an in-line comment if there’s one there.

This is a simple yet verbose set of tasks to do with grep. This section of the workflow contains the full set of scripting steps.

Now figure out how to set up dependencies

From here, we’ll need to check out and build the code for this Action on the original and proposed changes. To figure out how to build the Action, we need to parse the action.yml file in the changed Action repository. This section of the workflow will tell us what needs to happen, then set it up.

An Action can be one of three things:

  1. Javascript, optionally with dependencies vendored.
  2. A container, either built on each run or pulled from a registry.
  3. A composite Action, which is an “action of actions”. I purposely skipped over these to force manual inspection. As best I know, there’s no limit to how deep these can nest … 😰

Lucky for us, most things fall into those top two categories. Let’s do some bash if statements to figure out how to set these up.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
type=$(yq '.runs.using' action.yml)
if [[ $type == docker ]]; then
  image=$(yq '.runs.image' action.yml)
  if [[ $image == "Dockerfile" ]]; then
    echo "This Action uses a Dockerfile, building it..."
    docker build -t ghcr.io/${{ github.repo }}/prior-action-image:latest .
    docker push ghcr.io/${{ github.repo }}/new-action-image:latest
    echo "NEW_ACTION_IMAGE=ghcr.io/${{ github.repo }}/new-action-image:latest" >> $GITHUB_ENV
  else
    echo "This Action uses a pre-built Docker image: $image"
    echo "NEW_ACTION_IMAGE=$(echo $image | cut -d'/' -f3-)" >> $GITHUB_ENV
  fi
elif [[ $type == node* ]]; then
  echo "This Action uses Node.js, installing dependencies..."
  npm install
elif [[ $type == composite* ]]; then
  echo "The proposed new Action is a composite Action, skipping build." >> $GITHUB_STEP_SUMMARY
fi

From here, we now have a workspace that looks like this, complete with dependencies at each of those commits.

1
2
3
4
5
6
7
8
.
└── /tmp/
    ├── prior-commit/
    │   ├── action
    │   └── all action dependencies
    └── current-commit/
        ├── action
        └── all dependencies for it

This makes it easy to run malcontent in differential analysis mode as the next step. There’s a small in-line script here that changes the arguments based on whether or not it’s an image, but that’s it. Create a Markdown report, then upload it as a workflow artifact.

Commenting on the PR is the easy part

From here, the next job uses that Markdown report and puts it verbatim into the pull request as a comment for a human to review. This workflow section is what does the task. GitHub supports in-line Javascript with actions/github-script , making this a short snippet’s task.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
const fs = require('fs');
const filePath = 'malcontent-results.md';

if (!fs.existsSync(filePath)) {
  throw new Error(`File not found: ${filePath}`);
}

const fileContent = fs.readFileSync(filePath, 'utf8');

github.rest.issues.createComment({
  issue_number: context.issue.number,
  owner: context.repo.owner,
  repo: context.repo.repo,
  body: fileContent,
});

Putting it all together

Put the full workflow file on the main branch in your repo. Wait for Dependabot, or anyone else, to open a PR bumping a change to a GitHub Action.

🎉 Here’s what that analysis looks like in practice:

actions-diff-dark actions-diff-light a likely uninteresting capabilities change in one dependency’s NPM dependencies (PR shown )

Final thoughts

Now I have a bit of visibility into the changes happening upstream in my GitHub Actions and in their upstream too. However, YARA rules can be noisy, so finding meaning in that data still takes human oversight. There’s also no guarantee that the capability change in a transitive dependency is used in the Action.

But at least there’s more information about the changes going on in that little one-line automated pull request bumping my Action forward, auto-magically commented in that proposed change. 🔎


Footnotes

  1. Okay, work in progress and I’ve changed most of my projects do to this. 

This post is licensed under CC BY-NC-SA 4.0 by the author.