Firecracker is great and all, but the core idea here described works also with p...

colinchartier · on April 17, 2022

Author here!

The three big differences are:

1. Docker doesn't deal with running processes (like postgres or redis), only the filesystem state

2. Docker doesn't have enough isolation, so you'd probably need to run it within qemu or firecracker for compliance in bigger teams

3. Docker-in-docker is still pretty painful, if you need to do anything nonstandard like change the size of /dev/shm, access /dev/kvm, or load kernel drivers, it'll take custom configuration.

throwaway894345 · on April 17, 2022

I’m confused. Why do you need to snapshot live processes? Are we concerned about startup time of Postgres or whatever? Also, why is isolation needed for e2e tests? Lastly, why is docker-in-docker a requirement, and how is that easier than qemu in qemu or qemu in docker or whatever?

colinchartier · on April 17, 2022

> Why do you need to snapshot live processes?

Often times there are long-living processes which rarely change but take a long time to warm up. The Bazel [1] agent for C++ projects, the buildkit [2] state for docker, or the running Postgres or Redis server for a cloud native app for example.

It's why running "docker build" twice on your laptop is so fast, but running "docker build" in CI seems glacially slow.

> why is docker-in-docker a requirement, and how is that easier than qemu in qemu or qemu in docker or whatever?

The example given was running "docker-compose build", so you'd need either docker-in-firecracker (this post), docker-in-docker, or docker-in-qemu. You'd almost never run docker-compose build on bare metal in practice, because you'd immediately need to send the images you built somewhere else in order to use them.

[1] https://bazel.build/ [2] https://docs.docker.com/develop/develop-images/build_enhance...

cpuguy83 · on April 17, 2022

But that's state on disk, not process state. It should not affect startup time in buildkit.

I'm not experienced enough with Bazel to comment on that.

TheDong · on April 18, 2022

> 1. Docker doesn't deal with running processes (like postgres or redis), only the filesystem state

Docker in practice usually doesn't, but in theory it can and should.

A checkpoint/restore demo, of live-migrating running docker containers between servers, including running process state, was shown at Dockercon 2015. I can't find a video right now, but it was demoed then.

The "docker checkpoint" command has existed since docker 1.13.0 (released 2017), which allows checkpointing and restoring the running state of containers, including running processes, with some limitations. https://docs.docker.com/engine/reference/commandline/checkpo...

The reality is just unfortunately that these features are experimental, have been for the last 7+ years, and have taken far longer to mature than would be expected.

ignoramous · on April 17, 2022

Hi, offtopic but: is webapp.io a pivot from layerci, or just a rebranding?

Interesting that you're folks now use firecracker. I assume it now fills in adequately for the previously homegrown tech at layerci [0]?

[0] https://news.ycombinator.com/item?id=25979941

colinchartier · on April 17, 2022

Just a rebranding! (The technology's gotten better as well, of course - we didn't used to use firecracker at all)

https://webapp.io/blog/layerci-has-rebranded-to-webapp-io/

cpuguy83 · on April 17, 2022

Docker does handle snapshots of running processes. It's called checkpoint/restore, it utilizes the CRIU tooling to do this.

In terms of doing this in a CI env like actions where you may have different types of machines serving you, it may be problematic as the machine specs need to pretty closely match.

jitl · on April 17, 2022

Yeah, I don’t like that the article itself treats building the DB seed data, etc, into the Firecracker VM image like this is impossible to do in Docker. The techniques are good things to do — but it’s very tenuous how the techniques are connected to Firecracker.

I’ve do all of the above using multi-layered Docker files and a cron CI job to rebuild the base integration test image every 6 hours. Sure if you need the isolation, Firecracker is the way to go. But if you invest primarily in container shenanigans to speed up CI with Docker, it’s not too much extra work to wrap it in a Firecracker VM, plain QEMU, or whatever once you start wanting more isolation.

Also, maybe I’m holding it wrong but Docker in Docker had not bitten us yet on our GitHub action runners.

lmeyerov · on April 18, 2022

Yep, buildkit does incremental builds quite well

We find the dominating factor in (our) incremental builds / CI to be network/io caching, which has less to do with firecracker/docker and more with the surrounding hw/sw (gha topology & smarts, IO speed, ...). It's a real problem in GPU/AI CI where we get monster image sizes. There were some cool blog posts ~last year on caching and routing tricks happening at GH (joint with MSR?), but they've seemingly gone silent..

lgierth · on April 17, 2022

You don't need a management daemon running though, and get a complete virtualized kernel that can be customized if needed.

bornfreddy · on April 17, 2022

Ok, so IIUC, the main difference with firecracker versus docker is that processes are better separated from each other ("micro VM" instead of namespaces) and that one can run a customized kernel. But for e2e tests I've written, neither of these advantages mattered.

I do love the idea of taking a snapshot of a prebuilt database image and can see where this would really speed up the tests.