Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Slightly off topic. What are people doing about user data persistence on the cloud/Docker? Specifically, we are porting a desktop application to the cloud via application streaming technology with Docker, but we would like the user’s data and preferences to not go "poof" when the cloud instance disappears. Ideally, we would like some automagic way to attach, say, the user's dropbox account or the equivalent to the cloud instance. Is anyone working on that problem?


Kubernetes has persistent volumes: http://kubernetes.io/docs/user-guide/persistent-volumes/

You can use something like Glusterfs and distribute your files or it can hook in to your clod provider and create persistent storage automatically on something like EBS.


When I tried it way back when, he trouble I had with Gluster is that it didn't readily handle nodes randomly leaving and new ones joining. It was more hardware-centric in that if a node left, you were expected to bring that specific node back online. Is that still the case?


I don't have direct experience with any of these, but the Docker approach here is that Docker exposes a Volumes Plugin API[1] which allows third-parties to implement portable volume plugins that achieve the ability for a volume to persist across hosts and across containers.

A bunch of plugins have been implemented[2], but I haven't personally heard any real success stories of people using them, which doesn't necessarily mean such stories don't exist.

On a separate note, many other teams run stateful services that can handle the complete loss of a node's data. For example, it seems popular to run Elasticsearch in Docker, though again I'm still learning about this, pattern, too.

[1] https://docs.docker.com/engine/extend/plugins_volume/

[2] https://github.com/docker/docker/blob/master/docs/extend/plu...


NFS volumes. There are many storage solutions out there for containers.


I think this is where the idea of co-locating compute and storage really shines (eg: Joyent/Manta, scaleableinformatics.com) - moving all state to nfs sounds like a great way to widen the performance gap between RAM and permanent storage... Not to mention that if you really want to spread your data and your application, you would require nfs over vpn (unless encrypted nfsv4 actually works now?).


thats a good point. docker swarm supports 'labels' for pinning your containers to specific set of agents.


A few possibilities:

- If you don't actually need files, design around database instead

- If you need to store files redundantly, you can use a distributed filesystem like HDFS or GridFs

- If you need to store files redundantly, you can use an IaaS like Amazon S3

- If you need to store actual files (like for hard-linking), (redundancy optional), you can use network mounted storage

- If periodic backups of the data is ok, you can run backups that ship to S3 or glacier

Just a few, offhand, I'm sure there are a number of other techniques.


> If you don't actually need files, design around database instead

That is fine if your target is eg AWS - but if you want all your infrastructure in docker, the db needs persistent storage...


That's what volumes are for -- put a volume on NFS and have your DB persist there.

Even with docker you still need roles for servers; your DB class of servers will have a cloud-config that inits the correct mount points.


Can you really run a transaction-oriented DB over NFS "in the cloud" with meaningful performance and guaranteed writes to disk in case of network/power failure?


I'm interested in the streaming technology you mention. Is this something open source or supported natively by docker?


@amasad, my colleague created this which may interest you: https://github.com/Unidata/cloudstream


Thank you!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: