There are a lot of reasons to prefer reproducible builds, and many of them are n...

hinkley · on Aug 5, 2020

Scientific method debugging as well.

If the new version is busted, I want to rebuild the same output only changing the compiler, or package manager, or one of the library upgrades, and see what happens.

If I can't reproduce last tuesday then how do I get back to Friday when shit wasn't on fire?

hinkley · on Aug 5, 2020

This also goes the other way too. Repeatability is daylight that removes our ability to delude ourselves that a new problem must be someone else's code.

I don't want to police my coworkers. I just want to know that the infrastructure will hold my proverbial weight when I step on it, so I don't have to be afraid of side effects all day.

I want to know what I can trust them to do, and work to expand that envelope. If people break stuff, I eventually want them to be able to figure it out, reproduce it, and fix it all under their own steam, and [believe] that it's the right thing to do.

My automation strategy is a superset of a list I'm sure you've all heard already:

Make it documented

Make it work (automated)

Make it right (trustworthy)

Make it recommended

Make it easy (may include fast)

Make it mandatory

Make it an HR problem

About the time the tool is starting to get easy, you can start teasing people for not using it, but at some point it's expected behavior. More peer pressure from more sources. If they still like to cowboy, that speaks to trust, and they start getting delisted from new initiatives. If that still doesn't work (which, sadly, occasionally is the case), they are compartmentalized and shortlisted for the next reorg or layoff.

TazeTSchnitzel · on Aug 5, 2020

Reproducible builds make diagnosing performance regressions less horrible, because they are less likely to have been caused by random shuffling of code, and any random shuffling is reproducible when trying to bisect.

richardwhiuk · on Aug 5, 2020

Agree.

In critical infrastructure, it's often essential to provide customers with a single fix. If you can't reproduce the build that they have, then you can't do that.

hinkley · on Aug 5, 2020

We have an internal tool that grovels through dependencies looking for first party code, and then pulls that thread all the way to hyperlinks to the ticket numbers and the commit diffs.

So not only can we build another copy with the same third party code, when we do a bug fix you can validate that you in fact only got that one change into the bugfix.

Even with infrastructure to do surgery on dependencies, it's quite possible to fat finger something and get too much or not enough. So we built a sanity tool that gives the engineering sign-off prior to validation a bit of gravitas.

arcticbull · on Aug 5, 2020

> - If a particular version of the code is tested, and the binary is not a pure function of the code, then you haven't really tested the binary. Bugs could still be introduced that were not caught during testing because your build is non-deterministic.

This is a weaker argument IMO because when building for test, generally, all optimizations are disabled, debug info is emitted, symbols are un-stripped, and so on. The unit under test is usually very different from the shipped artifact even at the module level. Not least because the test functions are compiled in.

wildmanx · on Aug 5, 2020

If that's really what you are doing, then you are doing it horribly wrong.

aidenn0 · on Aug 5, 2020

You do see this in embedded for several reasons, including:

- For high-volume production, reducing ROM size can have a big impact on profitability (this is less true than it was 20 years ago, but still true), so your dev boards will have large EPROMS and your production boards will have small ROMs

- Debugging tools present may allow for easier reverse-engineering of your devices

Obviously the devices go through a lot of testing in the production environment, but things like error-injection just may not exist at all, which limits how much you can test.

arcticbull · on Aug 5, 2020

You test prod builds?

joshuamorton · on Aug 5, 2020

As part of the release process, yes...absolutely.

Compared to basically every other part of release qualification (manual QA, canarying, etc.) re-testing on the prod build is so unbelievably cheap there's no reason to not.

arcticbull · on Aug 5, 2020

I suppose we're referring to different kinds of testing. Manual QA, etc, on prod sure.

But if you're building client software artifacts, to unit test or integration test involves building different software in a different configuration, with different software and running it in a test harness. To facilitate unit testing or integration testing client software you:

- Build with a lower optimization level (-O0 usually) so that the generated code bares even a passing resemblance to what you actually wrote and your debugger can follow along.

- Generate debug info.

- Avoid stripping symbols.

- Enable logging.

- Build and link your tests code into a library artifact.

- Run it in a test harness.

That's not testing what you ship. It's testing something pretty close, obviously, but does not bear any semblance to a deterministic build.

lyricaljoke · on Aug 5, 2020

On the contrary; it's quite possible to design automated tests that operate on release artifacts. This is true not only at the integration level (testing the external interfaces of the artifact in a black-box manner), but also at a more granular level; e.g., running lower-level unit tests in your code's dependency structure.

It's true that not all tests which are possible to run in debug configuration can also be run on a release artifact; e.g. if there are test-only interfaces that are compiled out in the release configuration.

I think maybe the source of the confusion in this conversation is perhaps the kind of artifact being tested? For example, if I were developing ffmpeg, to choose an arbitrary example, I would absolutely have tests which operate on the production artifact -- the binary compiled in release mode -- which only exercise public interfaces of the tool; e.g. a test which transcodes file A to file B and asserts correctness in some way. This kind of test should be absolutely achievable both in dev builds as well as when testing the deliverable artifact.

joshuamorton · on Aug 5, 2020

No, I'm saying you re-run your automated unit tests on the release build because there's no reason not to.

If you have test failures in opt/stripped mode, they're more annoying to debug yes, but wouldn't you want to know?

Another way of putting this is that when you

> - Build and link your tests code into a library artifact.

You build and link the same object files that will be built into the release binary artifact, deterministically.

> - Enable logging.

I, uhh, usually do this in my released software too.

withinboredom · on Aug 5, 2020

> I, uhh, usually do this in my released software too.

Do you have any idea how annoying it is to get logged garbage when starting something on the command line (looking at you IntelliJ)?

I once spent several weeks hunting through Hadoop stack traces for a null pointer exception that was being thrown in a log function. If the logging wasn’t being done in production, I wouldn’t have wasted my life and could have been doing useful things. Sadly, shutting down the cluster to patch it wasn’t an option, so I had to work around it by calling something unrelated to ensure the variable wasn’t null when it did log.

joshuamorton · on Aug 5, 2020

Yes, which is why I regularly (think quarterly or annually) check to make sure we have good log hygiene, and are logging at appropriate log levels and not logging useless information.

I have alerting set up to page me if the things I care about start logging more than the occasional item at ERROR, so I have to pay some attention or I get pestered.

barkingcat · on Aug 5, 2020

That's called a debug build.

Not a test.

jeffbee · on Aug 5, 2020

Hrmm. Surely the vast majority of testing happens on non-release builds, despite the fact that release builds may also be tested. Unit tests are generally fastbuild artifacts that are linked with many objects that are not in the release, including the test's main function and the test cases themselves. Integration tests and end-to-end tests often run with NDEBUG undefined and with things like sanitizers and checked allocators. I would say that hardly anyone runs unit tests on release build artifacts just because it takes forever to produce them.

username90 · on Aug 5, 2020

When I was at Google we ran most tests both with production optimizations and without. There is no reason not to do it since the cost of debugging those problems is huge.

joshuamorton · on Aug 5, 2020

> Surely the vast majority of testing happens on non-release builds, despite the fact that release builds may also be tested.

Of course.

> I would say that hardly anyone runs unit tests on release build artifacts just because it takes forever to produce them.

I don't know that this follows: just because 99% of the invocations of your unit test are in fastbuild doesn't mean that you don't also test everything in opt at least once.

jeffbee · on Aug 5, 2020

I can't remember seeing any cc_test target at Google that ran with realistic release optimizations (AutoFDO/SamplePGO+LTO) and even if they did it's still not the release binary because it links in the test case and the test main function.

joshuamorton · on Aug 5, 2020

Did you look in the CI system for configurations there? I see FDO enabled in those tests. (Speaking at a high level, configurations can be modified in bazelrc and with flags without being explicitly listed in the cc_test rule itself)

> release binary because it links in the test case and the test main function.

Sure, but it's verifiably the same object files as get put into the release artifact.

tomjakubowski · on Aug 5, 2020

> Sure, but it's verifiably the same object files as get put into the release artifact.

Well, not if you use LTO.

wildmanx · on Aug 5, 2020

> You test prod builds?

Of course I do. Not doing so is completely irresponsible.

joshuamorton · on Aug 5, 2020

Note: This article is not talking about deterministic builds (which are a prerequisite for reproducible builds), but specifically reproducible builds.

Reproducible builds are generally speaking interesting only from a security perspective, while deterministic builds have all sorts of useful infrastructural features which the author agrees are useful.

And if you think I'm being pedantic, I'm using the official terminology from the reproducible builds site[0].

[0]: https://reproducible-builds.org/

GauntletWizard · on Aug 5, 2020

You're semantically right, but also missing the point. The expense of getting to deterministic builds is large - You have to take great care in your build infrastructure and scripts. The benefits are also large, and worth it.

Once you've gotten to deterministic builds, the expense of getting to reproducible builds is small; Typically days worth of work as opposed to months. The benefits are very different, but far from insignificant, and almost invaluable from a security perspective.

If you're going to do deterministic builds, go for broke - Do reproducible builds.

zeroimpl · on Aug 5, 2020

It really depends. If I’m building a Java project, I’m pretty sure I’ve got a deterministic build just by running javac pointed at a source directory. If I want a reproducible build, I probably need to do a lot more:

- ensure timestamps of all files embedded in jar files is consistent

- ensure there is no BuildTime/BuildHost/BuildNumber variable of any kind being captured

- ensure the exact version of compiler is documented

- ensure exact versions of all dependencies in classpath is captured

GauntletWizard · on Aug 11, 2020

There's an interesting pattern I've found in this: Don't ensure that there's no BuildTime/BuildHost/BuildNumber embedded. Ensure that all variables that are part of the build are captured and embedded. That is - It's okay for your build to include the Build Time, but that's an assertion at build time. Include it as a build output. Binaries should include all of the mutable environment used to build them as an embed. As in, their --version output should include them.

    # bazel version
    Build label: 2.0.0
    Build target: bazel-out/darwin- opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
    Build time: Thu Dec 19 12:33:30 2019 (1576758810)
    Build timestamp: 1576758810
    Build timestamp as int: 1576758810

rramdin · on Aug 7, 2020

Like the GP says, I can't imagine that would take more than a few days, even for a very large project with many dependencies.

Ericson2314 · on Aug 5, 2020

What is the difference?

tux1968 · on Aug 5, 2020

Deterministic build means every time you compile the same source you get the same executable. Reproducible build means that you specify and relay enough information to allow everyone else to reproduce your results for themselves in their own environment.

Ericson2314 · on Aug 6, 2020

Haha thanks.

So one has hidden variables that happen to be locally constant, and the other is purged of hidden variables.

biddlesby · on Aug 5, 2020

Completely agree! The article is blinkered to the security aspect. But imagine if compilers didn't create reproducible builds. Debugging would be a nightmare!

renox · on Aug 5, 2020

Uh? Is that sarcasm? Compilers don't produce reproducible builds. If you try to investigate a core dump using a binary recompiled from sources instead of the original binary, it's very likely you won't be able to analyze the core..

nix23 · on Aug 6, 2020

>it's very likely you won't be able to analyze the core..

You know that's the reproducible part, exact same library's same compiler same options.

>Compilers don't produce reproducible builds.

Sure they do, just compile two times the same and compare its hash, would be terrible if compilers makes something else every-time.

giantrobot · on Aug 6, 2020

By default you're not guaranteed the exact same output in two compiled binaries. There's a lot of variable bits[1] that make into binaries from C and C++. Different languages/compilers have different levels of variable bits.

[1] https://blog.conan.io/2019/09/02/Deterministic-builds-with-C...

nix23 · on Aug 6, 2020

>Different languages/compilers have different levels of variable bits.

No that's the point! Exact same environment same binary, that's reproducibility.

giantrobot · on Aug 7, 2020

Yes that would be reproducibility iff the environment is identical. However "identical environment" is a complicated issue.

Differing file paths, timestamps, and host date/time can all easily make their way into a binary through macros in several languages without explicit compiler/linker flags. If compiled artifacts are bundled into a container (like a jar file) their metadata need to deterministically set or else the container as an artifact won't be deterministic.

So yes doing all the work to make build deterministic enables reproducibility but it's not free or automatic. Then doing the work to ensure the build environment is deterministic is an additional task that's not free or automatic.

nix23 · on Aug 9, 2020

>make build deterministic enables reproducibility but it's not free or automatic

No one said that, Debian had a really hard time:

https://wiki.debian.org/ReproducibleBuilds/About

TwoBit · on Aug 5, 2020

Since when don't compilers produce reproducible builds? We did that at my last workplace with appropriate MSVC compiler options.

In any case, maybe parent is referring to using centralized debug symbols which can work for anybody in the org because their compilers all generate the same output.

mikepurvis · on Aug 5, 2020

Having stable BuildIDs has been important for me in being able to sanely manage a debug symbol archive where some binaries are periodically rebuilt on different CI worker nodes.

vmchale · on Aug 5, 2020

Yes, CI is the big one for me.

NovemberWhiskey · on Aug 5, 2020

This is exactly right. Having the build process be 'reproducible' i.e. deterministic means it is insulated from mysterious action-at-a-distance mechanisms that break things I want or need to be invariant within my CI/CD process.

SkyBelow · on Aug 5, 2020

>If a binary is lost, it can be rebuilt exactly as it was. You only need to ensure the source is preserved.

Wouldn't this depend upon the environment as well? Unless the build starts off by creating a build environment, but then we are half way to "To first make bread from scratch, create a universe...".

remexre · on Aug 5, 2020

If you run your builds in Docker, this is usually taken care of.

If not, it seems like it'd be good practice to document exactly what build dependencies are used; I've had to track down a regression introduced by inlined code from a dependency when we weren't tracking build dependency versions, and the fact that a bug was introduced with no changes in the relevant section of the code was troublesome to say the least.

m463 · on Aug 5, 2020

I think reproducible builds might be similar to locking a very-very-infrequently used data structure.

nhoughto · on Aug 5, 2020

Just the build caching is worth the price of entry.

Gradle especially does a great job at this.

cmckn · on Aug 5, 2020

Tangential, but I actually switched back to Maven after using Gradle on a fairly large project for about a year. The incremental/cached builds of Gradle were awesome, but I found writing build.gradle files to be a bit too hacky. I can achieve everything I need to with some simple configuration of an existing Maven plugin, whereas with Gradle, it always felt like I was doing something super custom and fragile.

I'd still love to set up a Gradle cache server, sounds so fancy!

Random piece of Gradle magic, a generated BOM: https://github.com/grpc/grpc-java/blob/master/bom/build.grad...

maxerickson · on Aug 5, 2020

The link addresses 7 arguments.

chacham15 · on Aug 5, 2020

Reproducible builds also tend to create isolation. I.e. if I compile one C file, change another, the compilation of the second one usually doesnt affect the first. This is useful when trying to debug code. E.g. I can add print statements to callers to get more info surrounding a bug when in a non-reproducible state, the recompilation itself may have eliminated the bug.