Surviving your first code audit, or Whodunnit? A git repo mystery.

Posted Jun 12, 2024 Updated Mar 16, 2025

By Natalie Somersall

6 min read

From BSides Boulder 2024, many attempts to figure out who did what, when, where, and why in a git repository (and some lessons learned, too). This is an expanded set of slides and resources since shown live on 14 June 2024 (YouTube recording).

How do you know what you know about your codebase?

Can it be proven in an audit? Are you sure about that?

Here’s some questions we’re trying to answer that I’ve had to grapple with firsthand:

“My infrastructure is code.” How do I prove changes?
“There was an incident.” Were leaked secrets to blame?
“We’re making an acquisition.” Can we even purchase this?
“There are foreign nationals on parts of this contract.” What code are they changing?
“Our software factory must carry compliance certification.” How can we get there?
“The software is in scope for SOX¹.”
… and many many more …

Where we’re going

Intro - Hi, I’m Natalie and I’m here to help
Biases - getting this out of the way up front
Threat models - what problem are we even solving?
Git, but really fast - a whirlwind overview of what matters in the ~/.git/ directory
Who did this? - all the ways you can’t prove who did what
Tips for auditing changes in git - some common ways to not prove what happened and other weird conversations
Time is meaningless and other terrible misunderstandings about how git understands time
Where to set compliance controls in regulated software developed in git repositories
Explaining why a code change happened during an audit is really hard to do, and even harder to prove, well after the change was made. Let’s make this reliably simple!
Why develop when you have to audit explores the business and people complexities on top of this deeply technical problem. Despite all the hardships we’ve reviewed, building software and systems in highly regulated environments can still be rewarding, fast-paced, and fun!

This is the process we’re going to dive into together. 🛟

flowchart LR
    A(fa:fa-user<br>developer) --> B(fa:fa-laptop-code<br>files)
    B --> C(fab:fa-git-alt<br>git repository<br>local)
    C --> D(fab:fa-github fab:fa-gitlab fab:fa-bitbucket<br>git repository<br>remote)

Intro

Hi, I’m Natalie. I do (what’s now called) “software factory” things with feds and defense folks, focusing on containerized application security these days. If any of these alphabet soups mean anything to you, that’ll give you a hint of what my work days look like:

NIST 800-53 and NIST 800-171 and NIST 800-172
NIST 800-190
ITAR
CMMC
FedRAMP
… probably more I’m forgetting …

I’ve been through and helped others through code audits that delve into difficult-to-answer questions about what we can prove about a codebase. These questions about your code repository become increasingly important as the “everything else as code” paradigm is adopted.

If your infrastructure, operations, identity or access controls, and system configurations are code, auditing a git repository is now on the critical path. These audits can be self-attested or third party, making life harder to plan for at times too. Here’s a ton of ways this has gone poorly for me (or others) in hopes you’ll learn from my mistakes. 🫠

Biases

Experience has given me some ~~heavy baggage to carry~~ strong assumptions.

🌸 Git is hard. 🌸

The basics of “stage, commit, push” are easy to learn. Then add forking or branching, then opening pull (or merge) requests … now undo a change, or force an update, or resolve conflicts … the nuances of git are difficult to master. This is a talk about the implementation weirdness and how it maps to the basics of regulatory controls - proving who did what, when, where, and why.

🌸 Git is better than anything else. 🌸

Maybe a lot of that is inertia. Folks learn git in school these days. Distributed is normal now, which wasn’t always the case. Tools like desktop GUIs, IDE integrations, and webapps that host a bunch of peripheral data about your code make life so very much better.

🌸 Most people don’t interact with git from the command line. 🌸

It shouldn’t be necessary in most day-to-day development and that is okay. The cool tools hide a lot of complexity and footguns, letting you focus on the thing you’re building.

🌸 Identity and authorization (IAM) is really hard. 🌸

Developer tools, regulatory compliance, endpoint management, and IAM are each multi-billion dollar industries. One could spend an entire career learning any one of these in depth. We’re at the intersection of these, so they’re all most likely in scope for an audit.

Threat models

For the record, I wrote and submitted this proposal before the whole xz backdoor² (CVE-2024-3094, writeup) thing happened. (What’s likely to have been) state-sponsored backdoors are an entirely different problem. Trust goes both ways - no “bad” code goes in (malware), no code leaves (IP threats). It’s a good place to talk about our threat model, though. There are four questions:

What are we working on?
What can go wrong?
What are we going to do about it?
Did we do a good enough job?

This time last year, we’d talked about threat modeling the GitHub Actions ecosystem and how you were the only person who could answer that last question. Much like Meat Loaf says, “I would do anything for love, but I won’t do that.” and throughout the entire 8 minute rock opera ballad, he never answers what “that” was. It was deliberate, so the listener could fill in the blank with their own personal “that.”

There’s no ambiguity today - the auditor or assessor will tell you.

Up next - what configurations really matter in a git repository? Part 2: Configuration matters

Footnotes

The Sarbanes–Oxley Act of 2002 is a United States federal law that mandates certain practices in financial record keeping and reporting for corporations. More from Wikipedia ↩
That fabulous MS-Paint style drawing of XZ is by Jerry Bell ↩

security git

This post is licensed under CC BY-NC-SA 4.0 by the author.

Where we’re going

Intro

Biases

Threat models

Footnotes

Trending Tags