Undo a commit on a failed GitHub Actions job
An interesting problem came my way not too long ago about how to undo a code change when a job fails in GitHub Actions. Since this job is triggered on a commit, we’d have to use git revert
to create and push a commit to undo itself.
The use case is an intermittent failure of terraform apply
after terraform plan
succeeded. The team wanted to roll back those changes in git automatically, then be able to troubleshoot once everything was working again. This could also be helpful if the team is using git to store configuration files to services that work as expected in one environment, but not others. Here’s how to do it:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
name: Randomly fail and undo the change
on: # this runs on push to the main branch, but should be customized for your deployments/builds
push:
branches:
- main
paths:
- 'test.txt' # only run this on pushes that touch the files you need
jobs:
randomly-fail: # this step randomly simulates failure
runs-on: ubuntu-latest
steps:
- name: Randomly fail
run: |
if [ "$(($RANDOM % 2))" -eq 0 ]; then
echo "randomly failing"
exit 1
fi
revert-change:
runs-on: ubuntu-latest
needs: [ randomly-fail ] # we need to say the job that sometimes fails is a dependency
if: ${{ failure() }} # because if it fails
steps:
- uses: actions/checkout@v3
with:
fetch-depth: 2 # check out the current and prior state
- name: "Revert the commit that triggered this run, then push it to main"
shell: bash
run: |
echo "Reverting ${{ github.sha }}"
git config user.name github-actions
git config user.email github-actions@github.com
git revert --no-edit ${{ github.sha }}
git push origin main
Here’s what that looks like when the first part fails, but the second succeeds:
And in the git history, a commit from github-actions
undoing my failed code changes:
Now here’s a few ways on how to do it better - not using my demonstration-quality shenanigans like pushing to the main branch.
First, pay attention to the job trigger (the on:
block) and make sure it’s only running when appropriate. A complete list of events that could possibly trigger a job is here . You should further filter it to only run on the appropriate files/paths/branches, using this handy cheat sheet .
Set up branch protections for the main/default branch and instead of pushing directly to that branch, change that last line to push to the correct git ref. Your workflow can pick this up in an environment variable , such as which branch or pull request triggered the run.
Be aware of and possibly set up some validation that you won’t recursively undo more changes than the one that started this workflow. To this end, always use the default GITHUB_TOKEN
and not a personal access token or other credential. This will prevent triggering recursive workflows (documentation ) by pushing code. The rest of this recursion prevention will be up to whatever else it is going on while things are being undone.
Lastly, if you have other methods of rolling back changes at the ready, that’s probably a better place to start than this. This workflow only rolls back the changes in the code without validating anything about the state of the environment before, during, or after this change.