Skip to content

Fedora is aiming for fully reproducible builds

Fedora Linux is aiming to get support for fully reproducible builds. This change represents state-of-the-art security practice, and it will greatly help alleviate supply chain cyberattacks and inconsistent builds from hardware failure or other causes.

Table of Contents

“Unlike Windows computers, Linux is not vulnerable to malware!”.

How often have you heard this sentence? I am willing to bet, plenty. One of the most widespread myths about Linux is that Linux users are inherently more secure than Windows users, because “Linux can't catch malware”.

This is obviously a falsehood, as anyone with even some basic computer science understanding can guess: there is no reason why a Linux computer would not be able to run malicious code. Digging deeper into this myth, though, it is easier to see where the people who claim this fact come from: since Windows holds the majority of the desktop market share, most cyberattacks, in the form of malicious ads or phishing e-mail, are going to target Windows computers.

However, this overly basic understanding of cybersecurity and malware is far disconnected from reality. Even if clicking on a malicious attachment was the only way one could get infected, malicious code can run on more platforms than one might think of: if Doom can run in a PDF file on the Chromium PDF editor on any operating system, why would your Linux system be inherently any safer?

Doom game running in a PDF file

But we are not here to discuss this. In 2025, most people know better than to download and open a file of questionable origins. The good cyberattacks are actually a lot more complex than that. In the past, there have been several cases when an attacker managed to hijack the small, independently maintained website that a developer used to distribute binaries for their program, and leverage that to distribute an infected payload to end users. There are ways to counter this, for example, by checking the hash of the file you downloaded. But does it mean much, if the origin hash you are comparing it against is from a webpage that has been modified?

You might be tempted to write this off as something that only affects Windows and Mac users. After all, in Linux land, the standard way to install a piece of software is to use your distro's package manager to obtain the package from a centralized, trustworthy source. This already improves the security of installing a piece of software remarkably better: your distro's infrastructure is properly safeguarded, and it is likely more secure than the homebrew home server some random independent developer is hosting. But is it perfect?

We use package managers here!

Imagine if an attacker was able to carry out a sophisticated enough attack that they were able to get the infected version of a certain package — perhaps something critical, like a library that a lot of other applications you may have installed depend on, or the Linux kernel itself — directly in your distro's repositories. And imagine nobody ever finds out.

You don't have to imagine. This is called a supply-chain attack, and it is one of the attacks that you, the end user, cannot do a lot to avoid. This is what happened to compression library xz. I am going to be short here, because this would deserve an entire dedicated article to itself, but the entire xz project was compromised by a malicious maintainer, going under the name of "Jia Tan". Jia Tan spent several years building a reputation as a core contributor to the project, only to then abuse the trust they had built to include a backdoor in one of the tarball source code releases. Linux distros typically build their packages from those releases. Eventually, the malicious code had found its way into a gigantic number of Linux systems. Thankfully, the malicious payload was discovered early enough that only users who had very recently upgraded their systems were affected, but it could have been a disaster.

Image credits: grindinsoft.com

This is bad. Forget about obviously suspicious email attachments from shady senders — this kind of attack does not spare anybody.

How do we solve this?

While there is not much that can be done, at the distro level, to prevent future situations like this, Linux distributions can still take steps to ensure their own packages they build and distribute through their infrastructure are exactly as you would expect them to be.

What if there was a way to ensure that the binaries that were produced from a compilation and a package build of a piece of software perfectly matched the source code, and there is nothing that slipped through the cracks? Good news, it exists. It's called reproducible builds.

What are reproducible builds?

Quoting the official definition from the Reproducible Builds initiative directly,

Reproducible builds are a set of software development practices that create an independently-verifiable path from source to binary code.

A build environment produces reproducible builds when it is deterministic: with the same inputs, the same outputs are built. If, building the same source code, on the same operating system, using the same compiler and build instructions, anybody is able to reproduce a bit-by-bit identical copy of an artifact, then it means that the builds are reproducible.

Infographic on how reproducible builds work. Image source: aspirationtech.org

If you were able to prove that a finished package build you get is perfectly equivalent to what you were expecting to get, it would be trivially easy to avoid a host of issues: an attacker would have no business trying to ship malware by compromising your package repos. A hardware defect, such as a bit flip in memory on a compile server, would not cause a broken package build that would break in subtle ways on people's computers. All in all, users would be running systems that are more secure and more reliable, eliminating an entire class of possible bugs.

An unsung benefit of this situation is that this really aids the debugability of artifacts: if the build output is provable to be 1:1 derivate from the source code, then it is also easier to debug unexpected behaviors. It can be taken for granted that the malfunction is indeed due to a bug in the source code, rather than to a side effect caused by some bit-flip that happened in non-ECC memory at compile time.

Debian's journey with reproducible builds

While the topic of reproducible builds in Linux distributions is a hot one, it is not a new practice. The GNU project has been involved with reproducible builds since 1990, when they were making active efforts to make the builds for their GNU coreutils bit-for-bit reproducible.

Not much later, people started talking about integrating reproducible builds in the Debian project. The topic was mentioned at first in 2000, and then more explicitly in 2007, when Martin Uecker mentioned the possibility of implementing reproducible builds in the debian-devel mailing list:

I think it would be really cool if the Debian policy required that packages could be rebuild bit-identical from source.”
- Martin Uecker, 2007
Conversation about reproducible builds in debian-devel, 2007

After that, interest in reproducible builds grew slowly through the years: first due to the Bitcoin project implementing them, then, due to the disclosured on global surveillance in 2013. In light of those findings, Mike Perry began working to make Tor Browser builds reproducible, in order to stop a third party from attacking the infrastructure directly to compromise the browser:

malware that attacks the software development and build processes themselves to distribute copies of itself to tens or even hundreds of millions of machines in a single, officially signed, instantaneous update
- Mike Perry, Tor Project, 2013

The Debian project finally started working on reproducible builds for their own infrastructure in 2013, after considering the idea at a discussion held in July in DebConf13. The discussion had been organized last-minute, and it only had 30 attendees, including the technical committee and other core teams.

The change was introduced via a discussion held in DebConf13, where other projects that had been implementing that technology — such as the Bitcoin currency — were brought as a demonstration of the possibility and the advantages of implementing such a system.

Reproducible builds mentioned in DebConf 2013

The Debian community was so interested in what had been the output of that discussion that a wiki page about reproducible builds was created, with the idea to get five packages to build reproducibly as a proof-of-concept. Getting that PoC ready unveiled several critical areas in the toolchain that would have needed to be addressed to make even one single package build reproducible, and this is where the bulk of the work began. That work proceeded very quickly, too: after some infrastructural modifications and two mass rebuilds, 67% of packages already supported reproducible builds by August 2014, in time for DebConf14.

Wiki page on Reproducible Builds on Debian's documentation

The fruits of Debian's labor were fruitful: by July 2017, 90% of the packages distributed by the Debian project were reproducible and, today, all of them should support reproducible builds.

Current Debian homepage on Reproducible Builds

What Fedora is doing in this direction

While Debian is a virtuous example in adoption of this technology, Fedora is not quite there yet. For the longest time, the need for implementing reproducible builds was not evident within the project: all RPMs are built in a centralized, strongly protected environment. All packages are built from a dist-git, a git repository that contains the build instructions and a cryptographic hash of the package sources. This system allows to easily verify what changed between package builds, with what inputs they were built, and what changed in the build environment. Because of this very strong and vertical control on the entire build process, for the longest time, reproducible builds have not been considered a priority in the Fedora project.

There are also some caveats to implementing reproducible builds in the same way Debian has in Fedora: because rpm packages are distributed after signing, with the signature embedded in the RPM itself, it is impossible to achieve fully identical results. Another obstacle is that rpm builds inject some information about build time into the headers, which is mutable by definition. However, there is ongoing discussion about this issue.

That, however, does not mean the Fedora project is not taking steps to make their builds at least way more reproducible than they currently are. The work has kicked off after a discussion at the RPM Developer's Meetup at DevConf.CZ 2023, a recurring event where RPM developers catch up on the status of RPM packaging.

RPM Developers meetup at DevConf.CZ 2023

From there, a hackfest was organized during Flock 2023, that year's edition of Fedora Project's conference, where the work was given more structure: goals were formalized, an approach was defined, and documentation work on the known issues started.

Page for the hackfest organized at Flock 2023

A recap on the hackfest is available on Discourse, for those who might be interested.

Recap on the hackfestr

At last, the project gets announced on the Devel list in March 2024, kicking off the real work on reproducible builds for good. Now, it remains to be seen what progress will be made in the following months!

Formal announcement on Devel

Flock 2025 will be held in early June this year. Sadly, odds are I will not be able to physically be there, but it's an event worth staying tuned for: hopefully, we will know more by then. There is an interesting talk on a "better dist-git ci" in the schedule already, among several others that seem related, so I have little doubt we will know more very soon.

Landing page for Flock 2025

Comments

Latest

Opinion: Is the Linux community really elitist?

The Linux community is often portrayed as elitist or prone to gatekeeping, but, nowadays, the opposite is true: the community really wants to help you grow, so you can give back one day. You should, however, be careful about niche sub-communities where people's behaviour is not ideal.