Slightly off topic. What are people doing about user data persistence on the clo...

pat2man · on June 20, 2016

Kubernetes has persistent volumes: http://kubernetes.io/docs/user-guide/persistent-volumes/

You can use something like Glusterfs and distribute your files or it can hook in to your clod provider and create persistent storage automatically on something like EBS.

ShakataGaNai · on June 20, 2016

When I tried it way back when, he trouble I had with Gluster is that it didn't readily handle nodes randomly leaving and new ones joining. It was more hardware-centric in that if a node left, you were expected to bring that specific node back online. Is that still the case?

joshpadnick · on June 20, 2016

I don't have direct experience with any of these, but the Docker approach here is that Docker exposes a Volumes Plugin API[1] which allows third-parties to implement portable volume plugins that achieve the ability for a volume to persist across hosts and across containers.

A bunch of plugins have been implemented[2], but I haven't personally heard any real success stories of people using them, which doesn't necessarily mean such stories don't exist.

On a separate note, many other teams run stateful services that can handle the complete loss of a node's data. For example, it seems popular to run Elasticsearch in Docker, though again I'm still learning about this, pattern, too.

[1] https://docs.docker.com/engine/extend/plugins_volume/

[2] https://github.com/docker/docker/blob/master/docs/extend/plu...

dominotw · on June 20, 2016

NFS volumes. There are many storage solutions out there for containers.

e12e · on June 21, 2016

I think this is where the idea of co-locating compute and storage really shines (eg: Joyent/Manta, scaleableinformatics.com) - moving all state to nfs sounds like a great way to widen the performance gap between RAM and permanent storage... Not to mention that if you really want to spread your data and your application, you would require nfs over vpn (unless encrypted nfsv4 actually works now?).

dominotw · on June 21, 2016

thats a good point. docker swarm supports 'labels' for pinning your containers to specific set of agents.

bkanber · on June 20, 2016

A few possibilities:

- If you don't actually need files, design around database instead

- If you need to store files redundantly, you can use a distributed filesystem like HDFS or GridFs

- If you need to store files redundantly, you can use an IaaS like Amazon S3

- If you need to store actual files (like for hard-linking), (redundancy optional), you can use network mounted storage

- If periodic backups of the data is ok, you can run backups that ship to S3 or glacier

Just a few, offhand, I'm sure there are a number of other techniques.

e12e · on June 21, 2016

> If you don't actually need files, design around database instead

That is fine if your target is eg AWS - but if you want all your infrastructure in docker, the db needs persistent storage...

bkanber · on June 22, 2016

That's what volumes are for -- put a volume on NFS and have your DB persist there.

Even with docker you still need roles for servers; your DB class of servers will have a cloud-config that inits the correct mount points.

e12e · on June 23, 2016

Can you really run a transaction-oriented DB over NFS "in the cloud" with meaningful performance and guaranteed writes to disk in case of network/power failure?

amasad · on June 21, 2016

I'm interested in the streaming technology you mention. Is this something open source or supported natively by docker?

julienchastang · on June 21, 2016

@amasad, my colleague created this which may interest you: https://github.com/Unidata/cloudstream

amasad · on June 21, 2016

Thank you!