There are a lot of reasons to prefer reproducible builds, and many of them are not security related... It seems a bit presumptuous to argue that noone needs reproducible builds because one particular security argument is flawed.
First, a non-flawed security argument: it only takes one non-malicious person to build a package from source and find that it doesn't match the distributed binary to spot a problem. Sure, if you don't compile the binaries yourself, you might not find out until later that a binary was compromised, but that's still better than never finding out. The reality is that most people don't want to spend time building all their packages from source...
More generally, reproducible builds make build artifacts a pure function of their inputs. There are countless reasons why this might be desirable.
- If a binary is lost, it can be rebuilt exactly as it was. You only need to ensure the source is preserved.
- If a particular version of the code is tested, and the binary is not a pure function of the code, then you haven't really tested the binary. Bugs could still be introduced that were not caught during testing because your build is non-deterministic.
- It provides a foundation for your entire OS image to be built deterministically.
- If you use a build cache, intermediate artifacts can be cached more easily, and use less space. For example, changing the code from A -> B -> A will result in two distinct artifacts instead of three.
If the new version is busted, I want to rebuild the same output only changing the compiler, or package manager, or one of the library upgrades, and see what happens.
If I can't reproduce last tuesday then how do I get back to Friday when shit wasn't on fire?
This also goes the other way too. Repeatability is daylight that removes our ability to delude ourselves that a new problem must be someone else's code.
I don't want to police my coworkers. I just want to know that the infrastructure will hold my proverbial weight when I step on it, so I don't have to be afraid of side effects all day.
I want to know what I can trust them to do, and work to expand that envelope. If people break stuff, I eventually want them to be able to figure it out, reproduce it, and fix it all under their own steam, and [believe] that it's the right thing to do.
My automation strategy is a superset of a list I'm sure you've all heard already:
Make it documented
Make it work (automated)
Make it right (trustworthy)
Make it recommended
Make it easy (may include fast)
Make it mandatory
Make it an HR problem
About the time the tool is starting to get easy, you can start teasing people for not using it, but at some point it's expected behavior. More peer pressure from more sources. If they still like to cowboy, that speaks to trust, and they start getting delisted from new initiatives. If that still doesn't work (which, sadly, occasionally is the case), they are compartmentalized and shortlisted for the next reorg or layoff.
Reproducible builds make diagnosing performance regressions less horrible, because they are less likely to have been caused by random shuffling of code, and any random shuffling is reproducible when trying to bisect.
In critical infrastructure, it's often essential to provide customers with a single fix. If you can't reproduce the build that they have, then you can't do that.
We have an internal tool that grovels through dependencies looking for first party code, and then pulls that thread all the way to hyperlinks to the ticket numbers and the commit diffs.
So not only can we build another copy with the same third party code, when we do a bug fix you can validate that you in fact only got that one change into the bugfix.
Even with infrastructure to do surgery on dependencies, it's quite possible to fat finger something and get too much or not enough. So we built a sanity tool that gives the engineering sign-off prior to validation a bit of gravitas.
> - If a particular version of the code is tested, and the binary is not a pure function of the code, then you haven't really tested the binary. Bugs could still be introduced that were not caught during testing because your build is non-deterministic.
This is a weaker argument IMO because when building for test, generally, all optimizations are disabled, debug info is emitted, symbols are un-stripped, and so on. The unit under test is usually very different from the shipped artifact even at the module level. Not least because the test functions are compiled in.
You do see this in embedded for several reasons, including:
- For high-volume production, reducing ROM size can have a big impact on profitability (this is less true than it was 20 years ago, but still true), so your dev boards will have large EPROMS and your production boards will have small ROMs
- Debugging tools present may allow for easier reverse-engineering of your devices
Obviously the devices go through a lot of testing in the production environment, but things like error-injection just may not exist at all, which limits how much you can test.
Compared to basically every other part of release qualification (manual QA, canarying, etc.) re-testing on the prod build is so unbelievably cheap there's no reason to not.
I suppose we're referring to different kinds of testing. Manual QA, etc, on prod sure.
But if you're building client software artifacts, to unit test or integration test involves building different software in a different configuration, with different software and running it in a test harness. To facilitate unit testing or integration testing client software you:
- Build with a lower optimization level (-O0 usually) so that the generated code bares even a passing resemblance to what you actually wrote and your debugger can follow along.
- Generate debug info.
- Avoid stripping symbols.
- Enable logging.
- Build and link your tests code into a library artifact.
- Run it in a test harness.
That's not testing what you ship. It's testing something pretty close, obviously, but does not bear any semblance to a deterministic build.
On the contrary; it's quite possible to design automated tests that operate on release artifacts. This is true not only at the integration level (testing the external interfaces of the artifact in a black-box manner), but also at a more granular level; e.g., running lower-level unit tests in your code's dependency structure.
It's true that not all tests which are possible to run in debug configuration can also be run on a release artifact; e.g. if there are test-only interfaces that are compiled out in the release configuration.
I think maybe the source of the confusion in this conversation is perhaps the kind of artifact being tested? For example, if I were developing ffmpeg, to choose an arbitrary example, I would absolutely have tests which operate on the production artifact -- the binary compiled in release mode -- which only exercise public interfaces of the tool; e.g. a test which transcodes file A to file B and asserts correctness in some way. This kind of test should be absolutely achievable both in dev builds as well as when testing the deliverable artifact.
> I, uhh, usually do this in my released software too.
Do you have any idea how annoying it is to get logged garbage when starting something on the command line (looking at you IntelliJ)?
I once spent several weeks hunting through Hadoop stack traces for a null pointer exception that was being thrown in a log function. If the logging wasn’t being done in production, I wouldn’t have wasted my life and could have been doing useful things. Sadly, shutting down the cluster to patch it wasn’t an option, so I had to work around it by calling something unrelated to ensure the variable wasn’t null when it did log.
Yes, which is why I regularly (think quarterly or annually) check to make sure we have good log hygiene, and are logging at appropriate log levels and not logging useless information.
I have alerting set up to page me if the things I care about start logging more than the occasional item at ERROR, so I have to pay some attention or I get pestered.
Hrmm. Surely the vast majority of testing happens on non-release builds, despite the fact that release builds may also be tested. Unit tests are generally fastbuild artifacts that are linked with many objects that are not in the release, including the test's main function and the test cases themselves. Integration tests and end-to-end tests often run with NDEBUG undefined and with things like sanitizers and checked allocators. I would say that hardly anyone runs unit tests on release build artifacts just because it takes forever to produce them.
When I was at Google we ran most tests both with production optimizations and without. There is no reason not to do it since the cost of debugging those problems is huge.
> Surely the vast majority of testing happens on non-release builds, despite the fact that release builds may also be tested.
Of course.
> I would say that hardly anyone runs unit tests on release build artifacts just because it takes forever to produce them.
I don't know that this follows: just because 99% of the invocations of your unit test are in fastbuild doesn't mean that you don't also test everything in opt at least once.
I can't remember seeing any cc_test target at Google that ran with realistic release optimizations (AutoFDO/SamplePGO+LTO) and even if they did it's still not the release binary because it links in the test case and the test main function.
Did you look in the CI system for configurations there? I see FDO enabled in those tests. (Speaking at a high level, configurations can be modified in bazelrc and with flags without being explicitly listed in the cc_test rule itself)
> release binary because it links in the test case and the test main function.
Sure, but it's verifiably the same object files as get put into the release artifact.
Note: This article is not talking about deterministic builds (which are a prerequisite for reproducible builds), but specifically reproducible builds.
Reproducible builds are generally speaking interesting only from a security perspective, while deterministic builds have all sorts of useful infrastructural features which the author agrees are useful.
And if you think I'm being pedantic, I'm using the official terminology from the reproducible builds site[0].
You're semantically right, but also missing the point. The expense of getting to deterministic builds is large - You have to take great care in your build infrastructure and scripts. The benefits are also large, and worth it.
Once you've gotten to deterministic builds, the expense of getting to reproducible builds is small; Typically days worth of work as opposed to months. The benefits are very different, but far from insignificant, and almost invaluable from a security perspective.
If you're going to do deterministic builds, go for broke - Do reproducible builds.
It really depends. If I’m building a Java project, I’m pretty sure I’ve got a deterministic build just by running javac pointed at a source directory. If I want a reproducible build, I probably need to do a lot more:
- ensure timestamps of all files embedded in jar files is consistent
- ensure there is no BuildTime/BuildHost/BuildNumber variable of any kind being captured
- ensure the exact version of compiler is documented
- ensure exact versions of all dependencies in classpath is captured
There's an interesting pattern I've found in this: Don't ensure that there's no BuildTime/BuildHost/BuildNumber embedded. Ensure that all variables that are part of the build are captured and embedded. That is - It's okay for your build to include the Build Time, but that's an assertion at build time. Include it as a build output. Binaries should include all of the mutable environment used to build them as an embed. As in, their --version output should include them.
# bazel version
Build label: 2.0.0
Build target: bazel-out/darwin- opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Thu Dec 19 12:33:30 2019 (1576758810)
Build timestamp: 1576758810
Build timestamp as int: 1576758810
Deterministic build means every time you compile the same source you get the same executable. Reproducible build means that you specify and relay enough information to allow everyone else to reproduce your results for themselves in their own environment.
Completely agree! The article is blinkered to the security aspect. But imagine if compilers didn't create reproducible builds. Debugging would be a nightmare!
Uh? Is that sarcasm?
Compilers don't produce reproducible builds.
If you try to investigate a core dump using a binary recompiled from sources instead of the original binary, it's very likely you won't be able to analyze the core..
By default you're not guaranteed the exact same output in two compiled binaries. There's a lot of variable bits[1] that make into binaries from C and C++. Different languages/compilers have different levels of variable bits.
Yes that would be reproducibility iff the environment is identical. However "identical environment" is a complicated issue.
Differing file paths, timestamps, and host date/time can all easily make their way into a binary through macros in several languages without explicit compiler/linker flags. If compiled artifacts are bundled into a container (like a jar file) their metadata need to deterministically set or else the container as an artifact won't be deterministic.
So yes doing all the work to make build deterministic enables reproducibility but it's not free or automatic. Then doing the work to ensure the build environment is deterministic is an additional task that's not free or automatic.
Since when don't compilers produce reproducible builds? We did that at my last workplace with appropriate MSVC compiler options.
In any case, maybe parent is referring to using centralized debug symbols which can work for anybody in the org because their compilers all generate the same output.
Having stable BuildIDs has been important for me in being able to sanely manage a debug symbol archive where some binaries are periodically rebuilt on different CI worker nodes.
This is exactly right. Having the build process be 'reproducible' i.e. deterministic means it is insulated from mysterious action-at-a-distance mechanisms that break things I want or need to be invariant within my CI/CD process.
>If a binary is lost, it can be rebuilt exactly as it was. You only need to ensure the source is preserved.
Wouldn't this depend upon the environment as well? Unless the build starts off by creating a build environment, but then we are half way to "To first make bread from scratch, create a universe...".
If you run your builds in Docker, this is usually taken care of.
If not, it seems like it'd be good practice to document exactly what build dependencies are used; I've had to track down a regression introduced by inlined code from a dependency when we weren't tracking build dependency versions, and the fact that a bug was introduced with no changes in the relevant section of the code was troublesome to say the least.
Tangential, but I actually switched back to Maven after using Gradle on a fairly large project for about a year. The incremental/cached builds of Gradle were awesome, but I found writing build.gradle files to be a bit too hacky. I can achieve everything I need to with some simple configuration of an existing Maven plugin, whereas with Gradle, it always felt like I was doing something super custom and fragile.
I'd still love to set up a Gradle cache server, sounds so fancy!
Reproducible builds also tend to create isolation. I.e. if I compile one C file, change another, the compilation of the second one usually doesnt affect the first. This is useful when trying to debug code. E.g. I can add print statements to callers to get more info surrounding a bug when in a non-reproducible state, the recompilation itself may have eliminated the bug.
First, a non-flawed security argument: it only takes one non-malicious person to build a package from source and find that it doesn't match the distributed binary to spot a problem. Sure, if you don't compile the binaries yourself, you might not find out until later that a binary was compromised, but that's still better than never finding out. The reality is that most people don't want to spend time building all their packages from source...
More generally, reproducible builds make build artifacts a pure function of their inputs. There are countless reasons why this might be desirable.
- If a binary is lost, it can be rebuilt exactly as it was. You only need to ensure the source is preserved.
- If a particular version of the code is tested, and the binary is not a pure function of the code, then you haven't really tested the binary. Bugs could still be introduced that were not caught during testing because your build is non-deterministic.
- It provides a foundation for your entire OS image to be built deterministically.
- If you use a build cache, intermediate artifacts can be cached more easily, and use less space. For example, changing the code from A -> B -> A will result in two distinct artifacts instead of three.