Optimizing a repo for GitHub Pages
My website, plus a bunch of other random things, was in one mono-repo. It became a junk drawer I would definitely deal with … later. Spending the past few weeks on the road made it a great time for low stakes, low effort tasks - snacking through the old “to do” list.
🙈 This “junk drawer” repository pattern is terrible for static site hosts.
My site is hosted on GitHub Pages , but the same problem exists on GitLab or other code forges. It also applies to dedicated web hosting platforms or web storage buckets (such as AWS S3). It’s slow to build and much larger than it needs to be. This makes it likely to hit the limits of hosting platforms quicker than a repository with simple optimizations. While there was plenty of time until I hit the free limits for GitHub, I didn’t want to have to worry about it with some conference talks coming up.
Let’s walk through how I cleaned up my website to give it room to grow. Cleaning up this repo reduced both the website’s size and build times by roughly two-thirds. 🏎️
How’s this site work?
When I commit a markdown file into a repository with website content, a hook triggers in my code hosting platform that runs a build job … same as every other project’s continuous integration jobs. This job:
- Checks out the whole “junk drawer” repository (including large files stored with Git LFS )
- Builds it with a static site generator, such as Jekyll or Hugo
- Runs some validation, like checking links or other “tidiness tasks”
- Uploads the finished files to the site host - sometimes a static asset bucket or self-hosted solution, but for me it’s a simple GitHub Pages site because I don’t want to be a webhost
There are a couple places to improve the size, speed, or efficiency of the site.
Cleaning up the repository
To start with, it’s cleaning up the repository. Much of the drudgery and size is due to how it was structured.
First, I removed the old repository entirely and started a brand new empty one for it. Losing the changes meant not having to worry about rewriting history. That process is a giant pain. Besides, starting fresh also meant I could move files to where it really should be. I expanded what’s stored in large file storage to include all pictures and GIFs. Without a new repository, moving them would be time-consuming.
My .gitattributes
file now looks like this, drastically expanding what’s in LFS by default:
1
2
3
4
5
6
7
8
9
10
11
12
13
# Denote all files that are truly binary and should not be modified.
*.ico binary
# Use git-lfs for some files
*.pdf filter=lfs diff=lfs merge=lfs -text
*.gif filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.xlsx filter=lfs diff=lfs merge=lfs -text
*.jpg filter=lfs diff=lfs merge=lfs -text
*.jpeg filter=lfs diff=lfs merge=lfs -text
*.png filter=lfs diff=lfs merge=lfs -text
*.webp filter=lfs diff=lfs merge=lfs -text
*.svg filter=lfs diff=lfs merge=lfs -text
This also meant changing the build process in order to check out the files for LFS too. This page is built using Actions, so adding that was a single line of YAML .
Since I was creating a ✨ brand new ✨ repository, I took the opportunity to compress all of the pictures as well, further reducing the repository size by about 100 MB. I also made many smaller consistency updates to make my life easier, too. Think tasks such as renaming older pics to match the naming convention for images of ~/assets/graphics/DATE-POST/image-path.here
or fixing spelling errors that somehow weren’t caught before now.
Fixing the errors from fixing the spelling
All these little fixes created at least as many build errors. 🫠
Most of these were simple to find using a little function I’d written to build the site and run htmlproofer over it. Jekyll, and most other Ruby programs I use, run in a container to isolate the dependencies of any one program from the system-wide Ruby installation. I try to do the same for Python as well.
1
2
3
4
5
6
7
8
9
10
11
12
13
function check-website {
if [ "${1}" = "-h" ]; then
echo "Usage: check-website [path]"
echo "Check website with htmlproofer."
return
fi
rm -rf Gemfile.lock _site .jekyll-cache && \
docker run -it --rm \
--volume="$PWD:/work" \
ghcr.io/some-natalie/jekyll-in-a-can:latest /bin/sh -c \
"bundle exec jekyll b -d '_site' && \
bundle exec htmlproofer _site --disable-external"
}
Once that caught most of the errors, I then fixed the post titles and redirected them to preserve any bookmarks. Small shoutout to the fantastic gem jekyll-redirect-from
, which allowed me to fix spelling errors in post titles while maintaining the old (misspelled) title too. Adding this required three things:
First, add it to the site’s ~/Gemfile
1
gem "jekyll-redirect-from", "~> 0.16", ">= 0.16.0"
Next, tell Jekyll to use it in ~/_config.yml
1
2
plugins:
- jekyll-redirect-from
Lastly, configure redirects as needed in the post frontmatter. Here’s an example, where I’d misspelled the title in the original filename.
1
2
3
4
5
6
7
8
9
10
---
title: "The cost of waiting on builds"
date: 2022-10-03
tags:
- CI
- business
excerpt: "How much does it cost to wait on builds?"
redirect_from:
- /blog/waiting-on-bulids/
---
… and that’s it! 🎉
Other repo hygiene tasks
Since I was cleaning house, a ton of spelling errors and other “linter nags” were fixed. To prevent these moving forward, a new job runs on each pull request to lint the post before going live using markdownlint inside of super-linter .
However, it complained about a few things that need to be allowed every time for me. This is easy to configure with an extra file.
1
2
3
4
5
{
"line-length": false,
"MD033": { "allowed_elements": ["br", "summary", "details", "div"] },
"MD031": false
}
How much faster?
Optimizing the storage of images, both by shrinking them and moving them into LFS storage, was incredibly impactful and simple to do.
what | before | after | improvement |
---|---|---|---|
build time | 14 seconds | 5 seconds | 64% decrease |
archive size | 250 MB | 102 MB | 59% decrease |
repo size | 680 MB | 336 MB | 51% decrease |
The law of diminishing returns kicks in pretty quickly after this. The build could possibly get a little faster, but Jekyll is single-threaded so it’s not likely to be a huge difference. If I need it to get smaller still, the next step is probably moving the images and other large files into a content delivery service … which is way more work than I care to do at the moment. 🦥
In conclusion, the easiest and most impactful changes to optimize a repository for GitHub Pages are
- use large file storage as much as possible
- compress your images before committing them
- starting fresh is great - don’t be afraid to lose history instead of struggling to rewrite it