Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> this fact and the use of raft in production has caused real, large scale network outages.

Paxos as well, I remember full cloud GCP outage that had something to do with Paxos, and I can’t find the data on it but I thought there was a nasty bug in zookeeper paxos implementation.

That isn’t to say any of these are perfect or bug free, it’s made by humans and we’re going to make mistakes, but my experience implementing both was I had a working raft implementation and paxos baked my brain until I gave up.

I think everyone uses raft _because_ it was possible to implement for a working dev, so there are a number of implementations, and it’s easier to understand the phases the application is in.

I’ll check out VSR I appreciate the rec.



My zookeeper outages are always due to simple things like all the workers having the same basic image and then having increased write rate filling up the disk of all workers at the same time.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: