Understanding and Hardening Linux Containers [pdf]

X86BSD · on April 21, 2016

Obviously the author put a lot of effort into this paper. Hard work shows throughout. Kudos to you sir!

Granted I realize the title is "Understanding and Hardening Linux Containers".

However, personally, this just illustrates my frustration and the frustration of others with the Linux world.

They exist in their own little echo chamber. Linux was not the first to create "containers" that are secure. There is no inclusion of other solutions outside Linux.

The reason other solutions do not have these problems is because the authors actually thought about the problem. And if the Linux camp would simply look outside their echo chamber long enough to see how others solved these problems before them this paper might not have been written.

It's just simply frustrating to watch Linux reinvent some wheels, poorly, time after time.

Flame on!

Annatar · on April 21, 2016

Totally agree with you. I fail to see the point of attempting to harden Linux "containers" (Linux actually does not have container technology, since cgroups are not containers). Why bother with that, when illumos, and SmartOS on top of it, provide free, open source Solaris zones technology? Not even installation is required, since the thing runs from random access memory, and can run Linux at the speed of bare metal?

Everybody is just flapping on Linux containers. Meanwhile illumos and SmartOS are eating Linux's lunch with techology which has been working reliably in Swiss banks since 2006, and it is free open source software. What's not to love?

There are T-shirts which say "Go away or I will replace you with a very small shell script." Now it's "Go away or I will virtualize your Linux inside of SmartOS." The damn thing runs like a bandit, while Linux containers keep on flapping trying to find a working solution with papers like these. Why bother, the problem has already been solved, go run SmartOS and enjoy a working solution!

z3t4 · on April 21, 2016

SmartOS if very nice in theory. I tried to get SmartOS working on a new Supermicro server and failed, even though I got a lot of help from their awesome community and people at Joyent.

Then I tried Linux (Ubuntu) and everything just worked! It was also very easy to find guides on the web for most things Linux.

Annatar · on April 21, 2016

If you want the JustWorks(SM), you have to get hardware which is on the Joyent engineering bill of materials' list, or research and buy hardware which illumos has drivers for.

http://eng.joyent.com/manufacturing/bom.html

SmartOS being descended from Solaris has detailed manual pages built into the operating system, because when one runs a real UNIX, one is entitled to comprehensive documentation with lots of examples.

A lot of the Oracle Solaris documentation, which is also very comprehensive, applies to and can be used on illumos derivatives. The SmartOS wiki and the smartos-discuss mailing list are good sources of SmartOS specific enhancements. Did I mention the built-in man pages yet?

RantyDave · on April 21, 2016

I've had a whole bunch of stuff just work. But if it doesn't, you're stuffed.

dozzie · on April 22, 2016

> Meanwhile illumos and SmartOS are eating Linux's lunch with techology which has been working reliably in Swiss banks since 2006

They would be eating Linux's lunch if they were used in any wider scale.

A primary objection to Solaris zones and a reason they're not used widely is that they run on Solaris and not on Linux.

dmpk2k · on April 22, 2016

SmartOS has the LX brand; in other words, SmartOS can look like a Linux kernel if it wants to. I've used Ubuntu on SmartOS, and it worked flawlessly.

Annatar · on April 22, 2016

> They would be eating Linux's lunch if they were used in any wider scale.

Linux took 20 years to make its way into the enterprises, and well, SmartOS has to start somewhere. We are only at the beginning of the journey, but with one major difference: unlike Linux, which is still maturing, an illumos based operating environment has 37 years of enterprise abuse and hardening behind it.

> A primary objection to Solaris zones and a reason they're not used widely is that they run on Solaris and not on Linux.

What exactly is objectionable about running a Solaris-like operating environment? If it is software, SmartOS provides ~14,000 packages:

https://www.perkin.org.uk/posts/building-packages-at-scale.h...

dozzie · on April 22, 2016

> [...] well, SmartOS has to start somewhere.

When SmartOS or generally Solaris-descendants' zones start being anywhere near popular, then we'll talk about eating Linux's lunch. For now the situation doesn't look like that even remotely.

> What exactly is objectionable about running a Solaris-like operating environment?

To me -- not much at first sight. But there is the factor of familiarity, which is very, very important. There are plenty of sysadmins, you know, who can recover Linux from almost any failure, and gaining similar knowledge in different OS takes quite long time. Solaris derivatives (or BSDs, for that matter) are at huge disadvantage to those people.

Annatar · on April 23, 2016

> When SmartOS or generally Solaris-descendants' zones start being anywhere near popular, then we'll talk about eating Linux's lunch. For now the situation doesn't look like that even remotely.

By eating Linux's lunch I meant how good, reliable and capable the zones and lx-branded zones technology is. It is very reliable and very capable. It works very well while Linux is still trying to find itself within the cloud container paradigm: look at how many competing solutions there are on Linux in that space, and every single one of them exists because the other one lacked something, or did not do it in a satisfctory way. That is what I meant by everybody's flapping on Linux with containers.

Meanwhile, there are no such issues or problems in the illumos and SmartOS communities: the tech used there has been designed from the ground up to be secure and work for large enterprises, and unlike the Linux cloud technology, it has been in heavy use for ten years now. illumos and SmartOS are not trying to still find themsselves, because zones, ZFS, and Crossbow network virtualization already worked for years. Back in 2006 for instance, at a place where I worked, we were already packing eight Oracle databases per 32 GB RAM system, in production. Now it's 2016, and zones have only gotten better with the addition of KVM and Crossbow network virtualization.

> But there is the factor of familiarity, which is very, very important. There are plenty of sysadmins, you know, who can recover Linux from almost any failure, and gaining similar knowledge in different OS takes quite long time. Solaris derivatives (or BSDs, for that matter) are at huge disadvantage to those people.

As for those people, they had to learn Linux too; they can go back to the roots and learn a real UNIX the correct way now, and be glad that they have the opportunity to do so.

If one truly knows UNIX, one understands Linux on a much deeper level.

dozzie · on April 23, 2016

> By eating Linux's lunch I meant how good, reliable and capable the zones and lx-branded zones technology is.

Could you then use idioms in their proper meaning, please? Because I only protested against claim that zones have taken a market share of Linux containers, which is very untrue, and at the same time not the claim you have made.

> As for those people, they had to learn Linux too; they can go back to the roots and learn a real UNIX the correct way now, and be glad that they have the opportunity to do so.

So, zones provide better platform because people can go back and learn Solaris? Or what?

Zones need to provide something tangible over containers for people to invest in learning different platform. Having just another tool, which merely does what one already can do with containers, only slightly slicker, but severly limited in applicability because parent platform is not familiar to all the colleagues, doesn't look like a good trade.

And note that we were only talking about familiarity. There are still other reasons, like little developers working on a given Solaris derivative, which doesn't give much confidence that any bugs that occur would be fixed, or a concern that the supported hardware list will be long enough.

Annatar · on April 24, 2016

> Could you then use idioms in their proper meaning, please?

eat lunch

eat someone's lunch

Sl. to best someone; to defeat, outwit, or win against someone.

http://idioms.thefreedictionary.com/eat+lunch

...It can refer to market share like you argue, but in a broader sense, this is at least how I understood the term eating someone's lunch.

> So, zones provide better platform because people can go back and learn Solaris? Or what?

No, Solaris (well, illumos, since nobody cares about Solaris any more) provides a better platform for learning and understanding UNIX. And with that comes the insight of writing better code in general, especially on Linux. For example, one is able to write completely portable shell code and deliver cleanly packaged software on Linux if one understands UNIX (in /opt, /etc/opt/, and /var/opt instead of /usr[/local] -- and that is just one trivial example).

zones are a fully working solution because they provide full security isolation, while running as regular processes with their own init(1M), instead of being forced to run Docker inside of KVM or XEN virtual machines to achieve that. And because processes inside of zones are just regular processes in the global zone, they only use as much resources as they need, and no more. With zones, one does not have to hard partition and preallocate memory and CPU to achieve full isolation, as is very often the case with running Docker inside of a virtual machine.

https://www.youtube.com/watch?v=0T2XFSALOaU

And because zones' storage is backed by ZFS, the data is guaranteed end-to-end integrity, and self-healing if the zpool is redundant (RAID1+0, RAIDZ, RAIDZ2).

Writing filesystems is hard: https://danluu.com/file-consistency/

For its part, illumos is the reference system as it is fully POSIX, XPG4, and SUS / UNIX03 compliant, so learning it means learning and mastering the standards.

And, once spoiled by all the nice features in illumos / SmartOS, going back to Linux feels like going back to skins and stone knives.

dozzie · on April 25, 2016

> Sl. to best someone; to defeat, outwit, or win against someone.

> ...It can refer to market share like you argue, but in a broader sense, this is at least how I understood the term eating someone's lunch.

It's still a far stretch to claim that zones win against containers. They could if we limited the field to just technical level, and then again, I would need to see some arguments why they actually are better.

>> So, zones provide better platform because people can go back and learn Solaris? Or what?

> No, Solaris [...] provides a better platform for learning and understanding UNIX.

You're confusing here learning unix with using it.

If one was at the stage of learning, Solaris could be fine, except there's much less material about it for newbies (especially about rough corners and stupid troubles, like "I ran `chmod -R 777 /'" or "I deleted my kernel"), and much, much less knowledgable people accessible to those newbies to learn from.

Now look at professional Linux sysadmins and programmers. They already have some knowledge about their OS and they're not likely to do a hard turn for different OS just for the sake of "understanding UNIX", as it would require quite large effort to just get to the point where they are already with Linux. Solaris would need to provide a really compelling function or feature to justify this effort, and all you said is "portability" (which most probably won't be necessary in most cases) and vague notion of "understanding UNIX".

EDIT: It was a little dishonest on my side to suggest that main selling points of Solaris derivatives were "understanding UNIX" and "writing portabile code". Solaris introduced ZFS and DTrace. But while it is said that ZFS with zones work well together, it still needs explaining why is it a significantly better combination than what Linux provides, and DTrace seems like a very distant benefit, less compelling each time a tool like Sysdig shows up.

Annatar · on April 25, 2016

> They already have some knowledge about their OS and they're not likely to do a hard turn for different OS just for the sake of "understanding UNIX", as it would require quite large effort to just get to the point where they are already with Linux.

How about no priority 1 incidents at 02:03 in the morning, resulting in sleepless nights until 07:30 - 08:30 every day during one's on call, wasted on stupid problems like ext3 filesystem corruption caused by lost writes (again: http://danluu.com/file-consistency/)? Is that a compelling enough reason?

No incidents at dead hours of the night. A programming environment fully supporting standards. Same software which runs on Linux. Guaranteed application binary interface backward compatibility: compile on oldest SunOS 5.# you can find, run on all the versions newer than that one. Change schedulers on the fly, as you need them. Realtime subsystem. kdb and mdb: the best post mortem analysis tools in the industry (what are those system administrators going to do when the kernel crashes so bad that it locks up and doesn't dump a stack trace?) Full security isolation and bare metal performance with zones.

Exhaustive documentation ("he who can read is clearly at an advantage"), much of it applicable to illumos / SmartOS:

http://docs.oracle.com/cd/E19253-01/817-1985/

Should I go on?

dozzie · on April 25, 2016

> How about no priority 1 incidents at 02:03 in the morning, resulting in sleepless nights until 07:30 - 08:30 every day during one's on call, wasted on stupid problems like ext3 filesystem corruption caused by lost writes (again: http://danluu.com/file-consistency/)? Is that a compelling enough reason?

I won't have calls about corrupted ext3, because I use XFS, so not much of a point.

"No priority 1 incidents" is not a function of an operating system, so it can't be a compelling reason to switch. You would still need to elaborate why exactly Linux doesn't work, especially when it works well enough in most admins' perception.

> A programming environment fully supporting standards.

Becasue Linux doesn't support a very important standard of... Could you help me here? Nothing comes to my mind.

> Same software which runs on Linux.

Sweet. I should learn Solaris, so I can run the same software as I currently can on Linux.

Do you hear how ridiculous this point is?

> Guaranteed application binary interface backward compatibility: compile on oldest SunOS 5. you can find, run on all the versions newer than that one.

Only ABI of kernel and (probably) OS-supplied libraries, which is not that much when it comes to what most of the companies run.

> Change schedulers on the fly, as you need them. Realtime subsystem.

Of course, because Linux doesn't allow to change schedulers and has no realtime subsystem.

> kdb and mdb: the best post mortem analysis tools in the industry (what are those system administrators going to do when the kernel crashes so bad that it locks up and doesn't dump a stack trace?)

This is a very distant benefit. One needs to be accustomed with kernel-level code in the first place to make any use of these.

> Full security isolation and bare metal performance with zones.

You get bare metal performance with Linux containers as well, so not a strong point. Only the "full security isolation" could be a sensible reason, but then, it should be proven by close scrutiny. I haven't heard too much about people analyzing Solaris zones, so I'm not likely to be convinced out of thin air that this argument applies.

> Exhaustive documentation ("he who can read is clearly at an advantage"), much of it applicable to illumos / SmartOS:

> http://docs.oracle.com/cd/E19253-01/817-1985/

OK, let's go to booting chapter: http://docs.oracle.com/cd/E19253-01/817-1985/hboverview-2546...

This is barely enough for a trained sysadmin to learn how booting works and where to nudge when things go awry or how to do less standard tasks (e.g. I'm not sure how much time would it take me if I needed to build a custom live CD image; probably quite a lot, judging from boot(1M) man page).

Even if it's exhaustive, it suffers from not having much between step-by-step-and-don't-think instructions and deep documentation that requires prior understanding of the system.

> Should I go on?

You haven't provided many arguments that would be sensible reasons (let alone being strong reasons) to invest in migrating to Solaris.

I think we should disengage from this discussion. If I was in (what I believe is) your state of mind, I couldn't muster any good argument that would convince my interlocutor.

dyn · on April 21, 2016

Hi author here.

Thanks! It was a lot of hard work, I'm glad it's being well received and seems like people are generally finding it useful- the main point of writing it.

I would have liked to talk a lot more about other non-Linux solutions, but that's where most of the NCC Group consulting work has been, which was a main driver for why this info needs to be out there (companies not understanding container security or not "turning it up to 11" in places it needs to be). My background is also mainly Linux, so I stuck to what I know.

Hopefully someone else will release a similar paper on non-Linux solutions/unikernels/hybrid platforms?

heroprotagonist · on April 21, 2016

I see nothing wrong with this. I think it's good to be specific. The Linux implementation is different from many other containerization solutions (eg, Solaris Zones, AIX WPAR, HPUX partitions, etc, so forth). The scope of "all container solutions for every platform" is too broad to be able to provide much depth, and the audience of people interested in container information for all platforms instead of a specific platform is fairly narrow.

There's plenty of room for someone to make a document specific to other platforms for a different audience. There's even room for someone to make a comparison of all platforms. It depends on your goal, your knowledge, and your intended audience.

X86BSD · on April 21, 2016

It's not that the paper is wrong for only focusing on Linux "containers".

It's that the paper, while describing the security short comings of the various solutions, illustrates the echo chamber I refer to.

"We have these security issues, that others do not have, but we are not going to look at how they solved these issues because, well, Linux!"

The paper, IMO, would have better served the audience had it included ideas and/or solutions people outside Linux-land developed to solve some of these issues that were written before Linux even attempted a "container".

You are free to disagree.

Scarbutt · on April 21, 2016

and how do you know the others solutions are really secure? Linux is going to get the most eyes, critics, scrutiny on this because is way more widely used than the other platforms that provide similar technologies.

AdieuToLogic · on April 21, 2016

  and how do you know the others solutions
  are really secure? Linux is going to get
  the most eyes, critics, scrutiny on this
  because is way more widely used ...

This is commonly known as the "given enough eyeballs" fallacy[1]. While it might help when more people have an opportunity to review security critical subsystems, a given OS' popularity certainly does not guarantee this. Heartbleed[2] is often cited as an exemplar for this very situation.

1 - http://www.amazon.com/Facts-Fallacies-Software-Engineering-R...

2 - https://en.wikipedia.org/wiki/Heartbleed

dap · on April 21, 2016

Because not only did the engineers who built it consider how the architecture would ensure security, they wrote all of the considerations and design choices down.[1] Even if they were completely wrong about everything, there's no reason to simply ignore all the thought that's been put into this problem.

[1] https://us-east.manta.joyent.com/bcantrill/public/ppwl-cantr...

i4k · on April 21, 2016

Linux being secure is a fallacy too.

Spender talks about it for a long time: https://grsecurity.net/~spender/interview_notes.txt

Linux developers fix security bugs in irresponsible ways, saying nothing about the attack vector or CVE's involved. Sometimes multiple vulnerabilities are addressed in the same patch, making hard to track the fix.

For more information see the exploit below: https://grsecurity.net/~spender/exploits/64bit_dos.c

The list of kernel vulnerabilities involving namespaces is bigger each day, as the paper shows, but the industry still uses "container" and "security" in the same quotes.

pyvpx · on April 21, 2016

other than something like seL4, no one _knows_ what is really secure. however it would be hard to argue that a decade+ of container systems such as FreeBSD Jail and Solaris' functional equivalent (the precise name escapes me) don't have useful lessons "Linux can learn from".

vacri · on April 21, 2016

> It's just simply frustrating to watch Linux reinvent some wheels, poorly, time after time.

Out of curiosity, why do you think that Linux got such traction and the BSDs didn't? I keep hearing from the BSD camp that the BSDs are better than Linux at, well, pretty much everything, but there must be some reason why BSDs were not as performant for people than Linux distros.

AdieuToLogic · on April 21, 2016

  Out of curiosity, why do you think that
  Linux got such traction and the BSDs didn't?

The AT&T lawsuit[1] against BSDi. Stymieing BSD in a well publicized lawsuit was enough to put a lot of people off from entertaining an x86 BSD (can't blame anyone for being prudent).

Right around the same time, one Mr. Linus Torvalds was making his own OS.

1 - https://en.wikipedia.org/wiki/UNIX_System_Laboratories,_Inc.....

vacri · on April 22, 2016

Thanks for the info, folks. I was unaware of the lawsuit. Curious.

tachion · on April 21, 2016

The lawsuit against involving BSD back in the day was one thing, and another one is... spreading FUD. That's about the fact that BSD's haven't got traction. Of course, relatively to Linux its smaller, or, better say, less visible: Sony Play Station is FreeBSD, Netflix uses FreeBSD for content delivery servers, WhatsApp is FreeBSD, OSX is BSD user land and much, much more, simply less visible.

pyvpx · on April 21, 2016

the first version of Meraki were NetBSD -- same for Apple Airport devices. Junipers management plane has historically been FreeBSD (but that is unfortunately changing)

JamesSwift · on April 21, 2016

As an ignorant spectator I appreciate the balance provided by your comment. However, I wish you would have provided some concrete examples of which wheels they are reinventing, and how those wheels exist in other solutions.

X86BSD · on April 21, 2016

Listen to this discussion[0] which highlights the point I am making. I set the marker for the relevant "container" comment but watch the entire thing to understand more fully what I and others are saying and why.

[0] https://youtu.be/Ya6h2zKlpaQ?t=1h8m11s

i4k · on April 21, 2016

Container technology is grounded on namespaces and cgroups. Both technologies existed long time ago. Linux namespace is a very limited implementation of Plan9's namespace. Solaris and FreeBSD, to cite only the most famous, have resource limits database for process groups.

geggam · on April 21, 2016

This..... this is the thing every architect attempting to roll containers out to production needs to read and re read

Complexity at scale: Orchestration frameworks (Rancher, MESOS/Aurora, Docker Swarm, LXD, OpenStack Containers, Kubernetes/Borg, etc) are only recently catching up to the container craze, there are too many competing models to list. Many have questionable or unaudited security or leave major requirements out, such as secret management. While containers may be easy to get working within a workstation or a few servers, scaling them up to production deployment is another challenge altogether, even assuming your application stack can be properly ``containerized''

kyrra · on April 21, 2016

The opinions stated here are my own, not necessarily those of Google.

From my understanding Google was one of the main contributors of the core features that have allowed containers on Linux. Specifically cgroups and LXC. And they have have been running containers for 10 years: https://research.google.com/pubs/pub43438.html

I'm not sure what you consider new, but it's definitely not super new to Google. I'm guessing your issue is that it's fairly new to being used in production across many companies, so security researchers are just starting to work on finding holes in it.

jsmthrowaway · on April 21, 2016

You are correct. cgroups v1 came from Google (mostly Paul Menage, if I recall my history). OP is in a sense slightly correct without realizing it that Borg predates cgroups themselves, but I'd wager Googlers would agree that they're pretty happily married at this point. Kubernetes just doubles down on what Borg learned in some ways (and discards others).

There's a distinction between containers the concept and Linux containers, of course, but yes. You've got it right.

seabrookmx · on April 21, 2016

Beyond that, there have been containers under Solaris and BSD jails for even longer.

The concept is not new.

jethro_tell · on April 21, 2016

This is true but containers for mortals have lagged behind. Especially in management tools. There was cloudvz. Then cgmanage that couldn't run au pid 1 which has security concerns of its own. It wasn't until systemd that a small team could stand up containers with any type of scale. Basically everything before, more or less, took a proprietary config management setup.

ak217 · on April 21, 2016

We stood up containers, on very large scale, in a PaaS, with a very small team, in 2012 (long before systemd, or Docker for that matter, became a major force).

This was made possible by the work done by Serge Hallyn, Stephane Graber, and others at Canonical. They delivered LXC and its integration with AppArmor, which in turn stood on top of the cgroups/namespaces work that Google contributed to the kernel. Ubuntu 12.04 was the release where the ability to securely run containers at scale became available to mere mortals.

dmpk2k · on April 21, 2016

My experience with Joyent's Triton stack has been both simple and positive. Much of the stuff in Section 10 either doesn't apply to the Triton stack (e.g. their implementation of the Docker server doesn't leave sockets lying around), or has already been solved there (e.g. their container implementation has already been hardened over a decade, and actually works well). I'm obviously quite the fan.

This document is more an indictment of the current state of the Linux container ecosystem, rather than containers themselves. Don't conflate the two.

geggam · on April 21, 2016

Reading more and more of the document ... you might change your mind if you do so :)

dmpk2k · on April 21, 2016

Yes, that's true. I was being a bit unfair.

_0w8t · on April 21, 2016

I do not see what is so difficult about management of secretes. Just add to the command/config file that starts container an option to bind mount file with secretes from the host like -v /path-on-the-host:/mount-point:ro with Docker and then use your favorite automation solution to transfer secrets.

And I do not see how a generic container deployment solution can help with this as distribution of secrets are way too deployment/application specific.

brown9-2 · on April 21, 2016

One issue with this strategy is that you can now only deploy that container/image on hosts with the file containing the image-specific secrets.

If you want to have a large pool of generic hosts that can run any image, then you've just reduced the size of the pool for some images.

If you choose instead to make every secret available to every host, then you've just spread your individual application secrets across many hosts.

_0w8t · on April 21, 2016

Typically secrets imply some per-container state that has to be transferred to the host in any case. Secretes can be transferred as a part of that container-specific state setup.

0xbadcafebee · on April 21, 2016

This is something a security architect/analyst who hasn't looked into containers before should read, but it's not a great revelation on how to deploy containers or secure systems.

Also, how is secret management a major requirement? That term doesn't even exist outside of open source Silicon Valley trendsters, btw. In the enterprise, only people with regulatory requirements use it, and then it's just called password management or credential management. More important requirements are typically the configuration of the services on the network, basic network security & access control, application security, and the security and training of internal users.

geggam · on April 21, 2016

Have fun trying to apply traditional models to the wonderful world of autoscaling containers utlizing service discovery :)

wstrange · on April 21, 2016

> secret management

Kubernetes does have a reasonable solution here: http://kubernetes.io/docs/user-guide/secrets/

tptacek · on April 21, 2016

Exposing /dev/random in containers does not put the system entropy pool at risk. That claim is repeated twice in the paper, and is false.

dyn · on April 21, 2016

Hi author here. Are you saying it doesn't allow a malicious/compromised container to reduce the pool size down to something which might cause other daemons to block, possibly indefinitely, which need "strong" entropy vs using /dev/urandom? Sure it's a really silly DoS, but it might matter in the right situations.

mfukar · on April 21, 2016

It will never stop being repeated, will it?

josh-wrale · on April 21, 2016

Could you link to supporting reference? Is there some smarts in the kernel to namespace, block or limit draining?

zuzun · on April 21, 2016

I assume the original author meant to say: draining /dev/random does not affect the cryptographic quality of the random numbers produced by both devices.

But yes, since there's only one entropy pool, attackers can drain /dev/random, causing other programs that rely on /dev/random to block.

All I can say is: on newer kernels, attackers can still drain the pool by using the getrandom syscall, so unless you block that syscall, not mounting /dev/random does not increase the security.

dyn · on April 21, 2016

Hi author here.

Yes, that is what I was trying to say. I'll clarify it in a future version.

That's a good point, I'll include that! Thanks.

throwaway2048 · on April 21, 2016

"Draining" is a fantasy, statistics about it are manufactured by the kernel randomness subsystems. Are you concerned with your ssh keys "running out" too?

raesene6 · on April 21, 2016

excuse my potential ignorance, but I thought that /dev/random was at risk of exhaustion in the general case, with /dev/urandom not being so...

e.g. http://www.onkarjoshi.com/blog/191/device-dev-random-vs-uran... or http://security.stackexchange.com/a/14293/37

cyphar · on April 21, 2016

I got schooled about this a few days ago. Here's the thread: https://news.ycombinator.com/item?id=11485832. The tl;dr is that basically you should just use /dev/urandom unless you're in very weird circumstances. Entropy doesn't "run out" in that sense.

raesene6 · on April 21, 2016

oh yes I agree /dev/urandom is the way to go, but the original comment said that the paper's suggestion that a container could exhaust the pool on /dev/random was false and that's what I was curious about. I thought it was possible to exhaust /dev/random...

cyphar · on April 21, 2016

I think all of the ancestors where saying that you couldn't use containers to "reduce the entropy" (which probably is meant to be "cause CSPRNG state to be known") won't happen because /dev/random blocking is done by some arbitrary statistic that has dubious reasons for existing. True, you can cause /deb/random to block, but that doesn't result in anything bad happening. Unless you have bad software that does bad things when reads block.

ben_hall · on April 21, 2016

I may be wrong, but the container could drain the pool stopping applications like Tomcat from launching

nicolast · on April 21, 2016

> and a number of other systems such as Mirage OS (a reimplementation of Solaris Zones).

I'd say that's not entirely correct. At all.

amirmc · on April 21, 2016

Yeah, there are a number of statements about unikernels that are ... somewhat inaccurate.

dyn · on April 21, 2016

Hi author here. I expected some folks who know a lot more about unikernels might have some feedback. I wanted to cover them because I think they're an interesting from a security perspective/they relate to containers but will admit I don't have much experiance with them. If you have time, I'm happy to fix the inaccuracies, feel free to DM me on twitter / email me? (first.last@nccgroup.trust) or (twitter.com/@dyn___). Thanks.

dyn · on April 21, 2016

Hi author here, I'll admit I'm not super familiar with Mirage OS, I've only used it a bit, but I wanted to include some discussions of it. I had read it was related (somewhere I can't remember) but I'm happy to be corrected.

floatboth · on April 21, 2016

Also:

> Joanna is also a core contributor and author of the high security Xen hypervisor desktop "CubesOS".

it's Qubes, not Cubes. Spelled correctly right in the URL the paper link to.

dyn · on April 21, 2016

Author here. Oops, will fix, thanks.

molecule · on April 21, 2016

@ 100+ pages, it would be great to get this in mobi or epub format.

ishtu · on April 21, 2016

"Containers are great. It's a shame Linux doesn't have any."

tyingq · on April 21, 2016

"While Linux Container systems (LXC, Docker, CoreOS Rocket, etc) have undergone fast deployment and development, security knowledge has lagged behind. The number of people focused on container security...seems disproportionately small"

I agree with this part. Most containers aren't running as an unprivileged user. Those environments that do support it only support it in a very limited set of os/kernel/whatever versions. Somewhat concerning since containers are getting traction almost everywhere.

cyphar · on April 21, 2016

Oddly enough, quite a few people in the runC community (including myself) are working on implementing the ability to start containers without root. If we can get this to work, it will be brought to Docker and you'll be able to start containers without even needing a daemon running as root (although you'd lose some functionality due to deficiencies in some of the kernel interactions with user namespaces -- but it should be more secure than it is now). It does bother me that the Linux kernel community entirely ignored other container implementations.

tyingq · on April 21, 2016

Yes, I've been following the progress, and I see the disconnects across the space. Like this issue: https://github.com/systemd/systemd/issues/321

Appreciate your efforts. It is good to know that there are people pushing to get this working.

cyphar · on April 23, 2016

If you're interested, I've got a WIP branch of runC that actually implements working rootless containers. This is really exciting. I'll be writing a blog post soon.

https://github.com/opencontainers/runc/pull/774

kazinator · on April 21, 2016

Harden this container!

http://regex.info/i/pic/2005-09-06_16:03.28__00002.jpg

Ideas: Cross-linked polymer? Switch to glass? Too heavy. Carbon fiber, epoxy resin composite?

0xbadcafebee · on April 21, 2016

A unix hacker would wrap it tightly in plastic wrap. Once you're done with the application you can re-use the wrapper.

kazinator · on April 21, 2016

I'm with you: plus the plastic wrap is a small self-contained tool that sort of does one job half-well.

stcredzero · on April 21, 2016

Has there been a writeup of the history of Container technology? How did these features find their way into the Linux kernel in the first place? What was the original motivation? (Sandboxing, I would guess.) Who were the stakeholders who pushed those features, and why?

It's weird and wonderful that such technology infrastructure arose and such innovation happens. I'm curious how.

_0w8t · on April 21, 2016

The article contains the best explanation of Linux capabilities that I have seen.

Too bad that Docker and other container solutions defaults to rather broad set of capabilities requiring to use things like

    --cap-drop=ALL --cap-add=NET_BIND_SERVICE ...

with typical container invocations to minimize a chance of container escape.

xori · on April 21, 2016

And now I wait for a poor soul to summarize those 100 pages in a paragraph or two.

nisa · on April 21, 2016

TL;DR: Pay attention to arcane details or don't use containers for security related stuff. Also grsecurity.

Honest question: Are illumos zones or FreeBSD jails better designed or is just nobody looking for kernel bugs there? From a usability perspective both win (IMHO) against this Linux mess.

This document is really great through.

dyn · on April 21, 2016

Author of the paper here. Thanks. As for Jails and Illumos, I would be willing to bet it's people not looking, but I haven't looked, so I can't really say. I agree it's kind of a mess, but it's getting better!

Annatar · on April 21, 2016

You would lose that bet so fast your head would be both spinning and smoking: zones have been hardened and worked on for enterprises since 2006, and in ten years have had three known vulnerabilities, the last two having already been fixed in illumos (and not exploitable without being able to be explicitly run by a user inside of a hypervisor).

As Bryan Catrill has said in one of his talks:

"we walked the trail if tears since our customers were very large companies; if they had a problem, we had a problem!"

The illumos and smartos mailing lists are hyperactive, with bugs being fixed, and new functionality added, which even Oracle Solaris doesn't have -- just subscribe to those two mailing lists and see for yourself. I warn you in advance: be prepared to be buried under the vortex of e-mails.

droopybuns · on April 21, 2016

Invoking the Trail of Tears to describe development hardship is, I think, inappropriate. Wish that guy would be a bit more cautious with his metaphors.

solarengineer · on April 21, 2016

Given that you have affixed your name and have now heard of Jails and of Illumos (and SmartOS), you may want to consider amending your paper to state as such.

Also, Illumos was forked off Solaris, and I'm sure that you know of Solaris' security.

Looking forward to your amendment and revised paper.

ktzar · on April 21, 2016

Is there a chance a repo with a .md version or an epub can be shared? It's quite hard to read a 100+ PDF document without printing it.

voidz · on April 21, 2016

I'd like this too. An .epub would be nice for e-ink readers like Pocketbook, Kobo, and Kindle.

voidz · on April 21, 2016

Is it possible to get notified (via email) when this paper has been updated?

X86BSD · on April 21, 2016

My username gives away my bias however, I think you will find Jails and the work Joyent/Illumos have done with containers, to be actually engineered secure from the getgo. Linux, well, I think it's obvious the route Linux took. And none of their "container" solutions were ever designed with security as the starting point. It was always bolted on as an after thought.

Trust me people have been looking for kernel bugs in FreeBSD to exploit jails since it was created. The record their speaks for itself. It's not a lack of eyeballs.

phaemon · on April 21, 2016

Jails are a powerful tool, but they are not a security panacea. While it is not possible for a jailed process to break out on its own, there are several ways in which an unprivileged user outside the jail can cooperate with a privileged user inside the jail to obtain elevated privileges in the host environment.

floatboth · on April 21, 2016

Everyone says that, and that's right, but does that situation happen often? A typical "cloud" service like Heroku or Travis CI does NOT give you an unprivileged user outside the jail!

cyphar · on April 21, 2016

Unless you consider someone who managed to break into a Heroku box but doesn't have root. They spin up a Heroku instance and now you have root. That's a security flaw.

somethingnew · on April 21, 2016

Here's a summary from the Docker perspective https://blog.docker.com/2016/04/docker-security/

jsmthrowaway · on April 21, 2016

The table from the report and Docker's championing thereof are also (nearly flagrantly) misleading, since Docker only supports image signing if you use the public hub. You cannot (repeat: cannot) sign Docker containers any other way, so it's barely a half feature and does not work for enterprises at all. But it says "strong defaults" in their table when describing this oddly useless feature, since enterprises are the ones most likely to invest in a serious key infrastructure and actually use signing:

https://docs.docker.com/engine/security/trust/content_trust/

> Content trust is currently only available for users of the public Docker Hub. It is currently not available for the Docker Trusted Registry or for private registries.

> Currently, content trust is disabled by default. You must enable it by setting the DOCKER_CONTENT_TRUST environment variable.

How is the complete lack of image verification until explicitly enabled, and only on public images, a "strong default"?

I'm also mystified by the row for SELinux, where rkt has Optional in scary yellow for some reason, and the other two do not. I suspect that table was the whole point of the independent review and it is fulfilling its purpose handily for Docker. I haven't even read the report and I can identify four suspicious discrepancies in that table alone.

ETA: Compare how the author describes Docker to rkt (it's rkt, not Rkt, too): http://imgur.com/a/D6nEw

dyn · on April 21, 2016

Hi, author of paper here. Couple things: - I agree the requirement of using Docker hub or their private hub is an unfortunate requirement, but I'm talking purely about the technical implementation. "Does not work for enterprises at all", well, there are a number of large enterprises that would disagree with you ;)

- For the table w/ SELinux row, Rkt is optional "scary" yellow because it only supports a single MAC implementation, it isn't very portable, and it's not enabled by default, vs LXC and Docker which both have quite strong MAC policies by default. Trying to parse all the info down into a table was honestly quite difficult (balancing being able to read it without a million footnotes for each point). Hopefully readers don't take all their take-aways from the table, and read the paper in full.

- I used Rkt for the name and rkt for the command. Seemed to help consistency.

Thanks for your feedback.

mjg59 · on April 21, 2016

(Disclaimer: I added SELinux support to CoreOS)

I'm a little confused around the SELinux issue. SELinux is inherently unportable - each distribution has its own policy (generally based on refpolicy, but sometimes fairly divergent), and it's basically impossible for an application to ship a policy that's compatible with more than one distribution. Rkt's SELinux design inherits from SVirt in such a way that in most cases it'll just work with a distribution's existing SELinux policy. It's fair to say that the number of distributions that ship policy that works with Docker is larger than for rkt, but this is fundamentally about distribution priorities rather than technological choices. On Fedora, rkt should provide identical SELinux confinement to Docker - on CoreOS it'll be better, since we support SELinux on overlayfs as well. Whether SELinux is enabled or not is (again) a distribution choice. Fedora ship with SELinux enabled by default, and both rkt and Docker will use it as a result.

jsmthrowaway · on April 21, 2016

Thanks for responding. I'm talking about the technical implementation, too. How is a feature hidden behind an environment variable a "strong default?"

And Docker simply screenshot the table, so noble goal, but...

I suppose I could remark upon MESOS, Rkt, and so on, and how getting names right is important because it characterizes the rest of your thoughts and analysis of the things you're studying, but I'll stick with the question I started with here.

bigmac · on April 21, 2016

You cannot (repeat: cannot) sign Docker containers any other way, so it's barely a half feature and does not work for enterprises at all.

What makes you think this? It is 100%, patently false. Private notary servers can be deployed alongside private registry servers without problem. See here for docs on how to do it: https://github.com/docker/notary/blob/master/docs/running_a_...

jsmthrowaway · on April 21, 2016

> What makes you think this?

The docs I linked and quoted, written by your own organization and helpfully pasted into the point I made?

I'm glad to see it's possible (if cumbersome), but I ruled Docker out for this purpose based on the exact link I just pasted. I also followed up and didn't see a "hey, you can sign private registries" bullet in your blog post responding to this paper, or much of anywhere, and Googling "docker sign private registries" doesn't go anywhere.

I'm still unsure why I'd stand up several daemons to accomplish signing a file, but that's a side point.

----

ETA: I can no longer reply because I've burned my precious HN comment budget commenting upon this paper (sorry, blame HN), so here's what I would reply to you downthread:

> 1. We have to host the signatures somewhere, so we host them in a store we call the notary server.

We've had this solved for a long time with .asc files, and Docker is already shipping an HTTP server or six. Shit, extend the Docker format and put the signature on each layer. There's a lot of prior art from RPM and dpkg in particular on how this can be done without writing yet another Docker daemon to run.

I'm sorry, I have to call bullshit, here. Docker is a very strong daemon-for-everything engineering culture, and that's the only reason it exists. It's also why folks are competing with you, because there are three or four different daemons in the Docker ecosystem that simply should not exist. Including dockerd.

> Think serving an outdated container with known-vulnerable software. Sadly, most artifact signing systems do not mitigate this attack today,

Because it's out of scope of a signature. That is conflating a signature with content revocation, which is a different problem altogether. A signature is an attestation of certain properties of data, and "is no longer valid content because circumstances changed after it was signed" is not one of them. The validity of the content is orthogonal to its signature. That known-vulnerable container is still a valid signature, and it's overreaching to expect a signature system to solve that problem. That is solvable in other ways.

Known-bad OpenSSL is still signed in repositories. And valid. And that's fine, because it's a separation of concerns; you get non-repudiation, integrity, all that stuff from a signature scheme. Upgrading to gatekeeping content on top of signatures indicates to me a fundamental misunderstanding of the problem ("can I run this?" instead of "this is an authenticated, intact image that came from where I expect"), which concerns me. You can solve the problem you present in other ways.

Mixing in the term "replay attack" is extremely confusing and I think diluting your point, because it is baffling me and really does not apply to what you are saying.

shykes · on April 21, 2016

Disclaimer: I work at Docker.

Most of your points are a criticism of TUF, of which Notary and Docker Content Trust are an implementation. Based on your comments I believe you're not familiar with TUF and the scope of problems it solves. Here's a good resource to learn more about it: https://theupdateframework.github.io

You clearly are not a fan of Docker and I respect your opinion, I don't really want to engage in that aspect of the discussion. Now, on the specific topic of secure content distribution, I hope you won't let your bias against Docker get in the way of understanding the benefits of TUF. It does improve the state of the art in secure content distribution, and you should really take the time to understand it and perhaps revisit some of your opinions. We're leveraging it in Docker and sharing our implementation, but you don't have to use Docker to use Notary or TUF.

If after reading about TUF you have specific criticism of it, I would be interested to hear about it.

jsmthrowaway · on April 21, 2016

My criticism is that a digital signature isn't enough for you. If I want to integrate into TUF, I can. If I don't and solve what TUF does another way, well, Docker said I'll use TUF. So I'll use TUF. Your position is that a digital signature is not useful in itself. This is wrong. It is.

Let's write a spec:

- Verify integrity and authenticity of a Docker image

The logical implementation:

- Digital signatures, detached or otherwise

Your implementation:

- Multiple complex, daemonic systems to reinvent software updating and, incidentally, signatures based on TUF

Your rationale:

- Digital signatures are not useful alone

So those of us who are aware of the limitations are left out in the cold, because we can't point gpg at a Docker image and just get the problem done. We have to learn this entire system Docker has created that's going to bring a grand unified software updating future. Maybe I have my own Omaha updater already. Maybe I just want dockerd to validate a signature. It is your prerogative to steer Docker toward crafting novel daemon engineering for every possible scenario, but that's the criticism I'm going to levy, whether you want to engage it or not.

The fundamental problem here is composability versus platform. My critique is not of TUF, of which I am not only familiar but excited. My critique is that organizationally at Docker, you take a problem like "sign an image," which is a perfectly useful primitive in every software distribution system on the planet, and say "that's not enough. We need a platform." You are dictating how my updating infrastructure works and then saying you've solved signatures. Which is technically accurate, I suppose.

I'm also pretty much over critique of Docker being cast as my not getting and/or understanding it. Believe me, Solomon, I get it, and I understand that you want to caricature everyone who disagrees with your strategy as biased against you. (That's actually the third time I can recall you making my criticism of Docker personal. I have no anti-Docker bias. I believe others are implementing what you're working on better and you've simply got the warchest, which is vastly different than having a bias. I used the shit out of ZeroRPC and I've respected a whole lot of your work since then. Come on.)[0]

We're talking about signing a file. Signing. A file. Which I cannot do without a whole shitload of infrastructure that I do not want (including MySQL, apparently), which is a systemic issue with Docker all the way back to dockerd.

Edit: [0]: 484 days ago we discussed exactly this, and here we are again, condescending criticism: https://news.ycombinator.com/item?id=8789181

bigmac · on April 21, 2016

Responding to your edit:

Notary, the underlying project that implements Docker's Content Trust feature, is an implementation of The Update Framework (TUF). Generally, you want a software update system to deal with a whole host of issues. Just solving "is this content signed" actually achieves very little. Survivable key compromise, freshness guarantees, resilience against mix-and-match attacks are all critical to building a system that actually meets real-world use cases and attacks. Threshold signing and signing delegation are additional features that you get when using TUF, which help with splitting the ability to sign across multiple individuals or systems.

You seem to be interested in this topic. I recommend you read a couple of papers to get some more background on why TUF exists and what problems it solves. A key point would be to understand why TUF deals with signed collections of software instead of just individual signed objects.

Start here to get an overview of The Update Framework:

1. Overview: https://theupdateframework.github.io/

2. Specification: https://github.com/theupdateframework/tuf/blob/develop/docs/...

Existing package managers and their shortcomings are covered in these two papers:

1. https://isis.poly.edu/~jcappos/papers/cappos_pmsec_tr08-02.p...

2. https://isis.poly.edu/~jcappos/papers/cappos_mirror_ccs_08.p...

bigmac · on April 21, 2016

The several daemons serve two purposes:

1. We have to host the signatures somewhere, so we host them in a store we call the notary server.

2. Notary has a concept of timestamping, so we spin up a timestamping server alongside a notary server that can guarantee the freshness of the data. We use a separate server so that folks can segment the timestamp signing functionality from the signature metadata serving functionality. This helps allow separation of concerns.

Timestamping is important because it can help prevent replay attacks where old, validly signed data is served to clients. Think serving an outdated container with known-vulnerable software. Sadly, most artifact signing systems do not mitigate this attack today, but we wanted to make sure ours would.

bigmac · on April 21, 2016

Sorry about that; I will get that page of the docs fixed.

Open invitation to anyone here: Our implementation of TUF via notary has been serving us well. If you decide to try it out and run in to any snags let me know and I can help you with getting it up and running. Contact info can be found in my profile.

mikecb · on April 21, 2016

There's TL;drs at the end of some sections.

otterley · on April 21, 2016

The whole paper is worth reading and is largely factually and historically correct. People often pay analysts hundreds of dollars for papers like this. Consider it a gift.

jsmthrowaway · on April 21, 2016

> The whole paper is worth reading and is largely factually and historically correct.

I found six inaccuracies in as many minutes after opening the PDF and scrolling to random pages. So I'm not sure that's true. I'm also not the only one making that claim, `spender has too:

https://twitter.com/grsecurity/status/722935691114512385?lan...

I don't want to knock the paper too hard without actually sitting down and reading it (which I'm going to do tonight, and I won't dump raw notes on HN until I've given the paper a serious chance), but I'm not encouraged right off the bat. I've been deep in the rkt integration hole and a lot of stuff referring to rkt is very rough and surface-level and, in five of the cases I mentioned, blatantly factually inaccurate. Same with Docker, too, actually, though the paper is quite obviously partisan (see http://imgur.com/a/D6nEw for example).

dyn · on April 21, 2016

Hi, Author of the paper here. After seeing the email Spender sent me, I can say most of his fixes/recommendations don't change a lot of the core messages/points/etc, even on grsec related sections. I'll be releasing a new version soon-ish merging in some of his feedback.

I tried extremely hard to not be "partisan", and I don't think I am kind to any container platform, but it's hard to argue where Docker is vs Rkt in terms of security (apart from possibly hw virtualization in Rkt Stage 1). I agree some of the Rkt stuff is higher level, mostly because after a large number of container assessments at some major companies, I have yet to come across Rkt. Most of my research comes from my own brief analysis, and the analysis of some peers. Maybe a future version will cover it more in-depth.

lawnchair_larry · on April 21, 2016

Despite the criticisms, this is a much needed analysis in this space, and looks very thorough. I've met countless development teams jumping in to these stacks and trying to find good security advice, or some kind of whitepaper to spell it all out. Looking forward to the updated version, and I believe this will help a lot of people with their projects.

fabulist · on April 21, 2016

This is not a sound security strategy.