Please note that this post, first published over a year ago, may now be out of date.
It’s good practice when developing software to automate the building, testing and deployment of the software in a CI/CD pipeline. Using CI/CD pipelines allows for quicker development cycles, automated testing, increasing reliability and repeatable builds.
When a piece of software depends on other pieces of software, it’s a widely accepted practice to ‘pin’ the dependencies to specific versions. By pinning your dependencies to a specific release, you can ensure that nothing will change with the dependency from build to build. If you’re developing software in a heavily regulated industry, you might also need to audit your dependencies to ensure the code is safe. Pinning to a specific release lets you audit that specific release and use it for all future builds. At least, that’s the theory.
Software projects that use git often use git tags to identify specific points in time of the project. In GitHub parlance, a ‘release’ is simply a git tag. You can then use the tag to identify a specific version of the software in your build process. Typically tags follow semantic versioning with a typical release/tag like v1.0.1.
So what’s the problem?
The problem with pinning software to tags, particularly for people with stringent security and auditing requirements, is that tags are not immutable.
But let’s rewind, what actually is a tag?
Tags can typically be classed as lightweight or annotated. A lightweight tag is simply a pointer to a specific commit. They’re quite limited, but they are the default in a standard git setup. Annotated tags are more involved and are stored as an object in the git object store. Like a commit, they have a pointer to a commit, a name, email, date, message and they can also be cryptographically signed. If you’re familiar with creating tags with GitHub through the release functionality then you’re using annotated tags. In either case, the tag points to a commit.
Why is this a problem?
Let me demonstrate. Here I’ve created a new, empty repository to represent an upstream project that my software depends on. To that repository I’ll add a simple textfile, tag the release as v0.1
and push it all back to GitHub:
paul@blog:~/tag-blog (main #)$ echo "Safe audited version" > README
paul@blog:~/tag-blog (main #%)$ git add README
paul@blog:~/tag-blog (main +)$ git commit -m 'My super safe software'
[main (root-commit) 10ac2b7] My super safe software
1 file changed, 1 insertion(+)
create mode 100644 README
paul@blog:~/tag-blog (main)$ git tag -a v0.1 -m 'Release v0.1'
paul@blog:~/tag-blog (main)$ git push origin v0.1
Enumerating objects: 4, done.
Counting objects: 100% (4/4), done.
Delta compression using up to 8 threads
Compressing objects: 100% (2/2), done.
Writing objects: 100% (4/4), 1.03 KiB | 1.03 MiB/s, done.
Total 4 (delta 0), reused 0 (delta 0)
To github.com:omahn/tag-blog.git
* [new tag] v0.1 -> v0.1
Now imagine that we’re in our CI/CD pipeline and our software depends on this example repository at the version tagged v0.1
, this is how that would typically be checked out during the build:
paul@blog:/tmp$ git clone --depth 1 --branch v0.1 git@github.com:omahn/tag-blog.git
Cloning into 'tag-blog'...
remote: Enumerating objects: 4, done.
remote: Counting objects: 100% (4/4), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 4 (delta 0), reused 4 (delta 0), pack-reused 0
Receiving objects: 100% (4/4), done.
Note: checking out '10ac2b790c7c8936f6e6d3488fe87c88962cebb4'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:
git checkout -b <new-branch-name>
paul@blog:/tmp$ cat tag-blog/README
Safe audited version
So far, so good. We should, in theory, now have a pipeline that will always checkout v0.1
so we have a safe repeatable build.
But in reality, we don’t, because git tags are mutable. Let me demonstrate by creating a new commit, deleting the existing release and creating a new one with the same release number:
paul@blog:~/tag-blog (main)$ echo "You're now compromised" > README
paul@blog:~/tag-blog (main *)$ git add README
paul@blog:~/tag-blog (main +)$ git commit -m 'Compromised'
[main 1494f07] Compromised
1 file changed, 1 insertion(+), 1 deletion(-)
paul@blog:~/tag-blog (main)$ git tag -d v0.1
Deleted tag 'v0.1' (was 0b454da)
paul@blog:~/tag-blog (main)$ git tag -a v0.1 -m 'Release v0.1'
paul@blog:~/tag-blog (main)$ git push -f origin v0.1
Enumerating objects: 6, done.
Counting objects: 100% (6/6), done.
Delta compression using up to 8 threads
Compressing objects: 100% (2/2), done.
Writing objects: 100% (4/4), 1.06 KiB | 1.06 MiB/s, done.
Total 4 (delta 0), reused 0 (delta 0)
To github.com:omahn/tag-blog.git
+ 0b454da...8e987a1 v0.1 -> v0.1 (forced update)
Now let’s run our pipeline again:
paul@blog:/tmp$ git clone --depth 1 --branch v0.1 git@github.com:omahn/tag-blog.git
Cloning into 'tag-blog'...
remote: Enumerating objects: 4, done.
remote: Counting objects: 100% (4/4), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 4 (delta 0), reused 4 (delta 0), pack-reused 0
Receiving objects: 100% (4/4), done.
Note: checking out '1494f07943e678b09ac9ca625280e6a14a8aa599'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:
git checkout -b <new-branch-name>
paul@blog:/tmp$ cat tag-blog/README
You're now compromised
Oh dear. Through no fault of our own and without any changes to the build dependencies, we’re now building using a different underlying dependency despite keeping the release number the same.
It’s worth noting that if you’re testing this locally on an existing clone of the repository then you will not see this behaviour. When git fetches or pulls from a remote repository it will not update your tags locally, the issue only arises in ‘clean room’ environments like pipelines when every build involves a fresh clone of the remote repository.
How can we avoid this?
Thankfully, it’s quite simple. Don’t use tags. At least, don’t use tags to mark trusted refs. Tags point to a commit, so instead of referencing the tag, just reference the commit directly. Let me demonstrate. Here’s the list of commits for our sample dependency:
paul@blog:~/tag-blog (main)$ git log --oneline
1494f07 (HEAD -> main, tag: v0.1) Compromised
10ac2b7 My super safe software
Originally v0.1
was pointing at 10ac2b7
, so let’s use that in our pipeline instead of the v0.1
tag and see what happens:
paul@blog:/tmp$ git clone git@github.com:omahn/tag-blog.git
Cloning into 'tag-blog'...
remote: Enumerating objects: 7, done.
remote: Counting objects: 100% (7/7), done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 7 (delta 0), reused 7 (delta 0), pack-reused 0
Receiving objects: 100% (7/7), done.
warning: remote HEAD refers to nonexistent ref, unable to checkout.
paul@blog:/tmp$ cd tag-blog/
paul@blog:/tmp/tag-blog (main #)$ git checkout 10ac2b7
Note: checking out '10ac2b7'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:
git checkout -b <new-branch-name>
HEAD is now at 10ac2b7 My super safe software
paul@blog:/tmp/tag-blog ((10ac2b7...))$ cat README
Safe audited version
This way, I get the safe audited version back again, and always will. Why? Because git commits are SHA-1 hashes that represent both the commit themselves, the tree of objects in the commit and all the content inside the commit. So if anything changes then the commit ID must also change.
You might be worried about a hash collision - where an attacker deliberately searches for a commit that matches an existing reference. The Git model defends against hash collisions by not letting you submit a commit with a hash that’s already used.
Do I need to worry about this?
It depends. If you are depending on projects directly from GitHub (or any other git provider) then it’s something you should be aware of if you’re using tags to pin those dependencies. If you operate in a regulated environment that requires auditing of dependencies in your software then you should definitely revise your pipelines to target specific commits as that guarantees the underlying software cannot be tampered with.
Designing effective systems security for your SaaS business can feel like a distraction from delivering customer value. Book a security review today.
This blog is written exclusively by The Scale Factory team. We do not accept external contributions.