Post

Surviving your first code audit, or Whodunnit? A git repo mystery.

From BSides Boulder 2024, many attempts to figure out who did what, when, where, and why in a git repository (and some lessons learned, too). This is an expanded set of slides and resources since shown live on 14 June 2024.

How do you know what you know about your codebase?

Can it be proven in an audit? Are you sure about that?

Here’s some questions we’re trying to answer that I’ve had to grapple with firsthand:

  • “My infrastructure is code.” How do I prove changes?
  • “There was an incident.” Were leaked secrets to blame?
  • “We’re making an acquisition.” Can we even purchase this?
  • “There are foreign nationals on parts of this contract.” What code are they changing?
  • “Our software factory must carry compliance certification.” How can we get there?
  • “The software is in scope for SOX1.”
  • … and many many more …

Where we’re going

This is the process we’re going to dive into together. 🛟

flowchart LR
    A(fa:fa-user\ndeveloper) --> B(fa:fa-laptop-code\nfiles)
    B --> C(fab:fa-git-alt\ngit repository\nlocal)
    C --> D(fab:fa-github fab:fa-gitlab fab:fa-bitbucket\ngit repository\nremote)

Intro

Hi, I’m Natalie. I do (what’s now called) “software factory” things with feds and defense folks, focusing on containerized application security these days. If any of these alphabet soups mean anything to you, that’ll give you a hint of what my work days look like:

I’ve been through and helped others through code audits that delve into difficult-to-answer questions about what we can prove about a codebase. These questions about your code repository become increasingly important as the “everything else as code” paradigm is adopted.

If your infrastructure, operations, identity or access controls, and system configurations are code, auditing a git repository is now on the critical path. These audits can be self-attested or third party, making life harder to plan for at times too. Here’s a ton of ways this has gone poorly for me (or others) in hopes you’ll learn from my mistakes. 🫠

Biases

Experience has given me some heavy baggage to carry strong assumptions.

git is hard

🌸 Git is hard. 🌸

The basics of “stage, commit, push” are easy to learn. Then add forking or branching, then opening pull (or merge) requests … now undo a change, or force an update, or resolve conflicts … the nuances of git are difficult to master. This is a talk about the implementation weirdness and how it maps to the basics of regulatory controls - proving who did what, when, where, and why.

🌸 Git is better than anything else. 🌸

Maybe a lot of that is inertia. Folks learn git in school these days. Distributed is normal now, which wasn’t always the case. Tools like desktop GUIs, IDE integrations, and webapps that host a bunch of peripheral data about your code make life so very much better.

🌸 Most people don’t interact with git from the command line. 🌸

It shouldn’t be necessary in most day-to-day development and that is okay. The cool tools hide a lot of complexity and footguns, letting you focus on the thing you’re building.

🌸 Identity and authorization (IAM) is really hard. 🌸

Developer tools, regulatory compliance, endpoint management, and IAM are each multi-billion dollar industries. One could spend an entire career learning any one of these in depth. We’re at the intersection of these, so they’re all most likely in scope for an audit.

Threat models

xz

For the record, I wrote and submitted this proposal before the whole xz backdoor2 (CVE-2024-3094, writeup) thing happened. (What’s likely to have been) state-sponsored backdoors are an entirely different problem. Trust goes both ways - no “bad” code goes in (malware), no code leaves (IP threats). It’s a good place to talk about our threat model, though. There are four questions:

  1. What are we working on?
  2. What can go wrong?
  3. What are we going to do about it?
  4. Did we do a good enough job?

This time last year, we’d talked about threat modeling the GitHub Actions ecosystem and how you were the only person who could answer that last question. Much like Meat Loaf says, “I would do anything for love, but I won’t do that.” and throughout the entire 8 minute rock opera ballad, he never answers what “that” was. It was deliberate, so the listener could fill in the blank with their own personal “that.”

There’s no ambiguity today - the auditor or assessor will tell you.

Up next - what configurations really matter in a git repository? Part 2: Configuration matters


Footnotes

  1. The Sarbanes–Oxley Act of 2002 is a United States federal law that mandates certain practices in financial record keeping and reporting for corporations. More from Wikipedia 

  2. That fabulous MS-Paint style drawing of XZ is by Jerry Bell 

This post is licensed under CC BY 4.0 by the author.