I'm on the other side. I think Leslie Lamport asserted that Paxos is minimal, an...

spmurrayzzz · on Sept 27, 2024

I tend to agree that many explanations of raft dont get into the useful details and handwave some of the hard problems. But the original paper does do a good job of this and is pretty accessible to read IMO.

> I read "once the leader has been elected", um, hangon, according to whom? Has node 1 finally agreed on the leader, just while node 3 has given up and started another election?

The simple response I think to "according to whom" is "the majority of voting nodes". When the leader assumes its role, it sends heartbeats which are then accepted by the other nodes in the cluster. Even if (in your example) node 3 starts a new election, it will only succeed if it can get a majority of votes. If node 2 has already acknowledged a leader, it won't vote for node 3 in the same term.

There's some implicit concessions inherent there around eventual consistency, but I don't think thats novel to Raft compared to other distributed consensus protocols.

withinboredom · on Sept 28, 2024

> The simple response I think to "according to whom" is "the majority of voting nodes".

Reminds me of this one time we had a Raft cluster arguing over who was the leader for 20 minutes in production. Raft leader election is non-deterministic, while Paxos is deterministic. It can 'randomly' get into a situation it cannot resolve for quite a long time.

spmurrayzzz · on Sept 28, 2024

> Reminds me of this one time we had a Raft cluster arguing over who was the leader for 20 minutes in production

That's certainly an interesting failure mode. Do you recall the details around root cause? I could imagine ephemeral network partitions (flapping interfaces? peering loss?) causing something like this for sure.

In my own experience, I've been running services that use Raft under the hood for the last ~10 years in production and haven't seen this happen myself. Though I do absolutely remember having misconfigured election timeouts causing very painful latency issues in failover scenarios.

withinboredom · on Sept 29, 2024

Root cause was “bad luck” IIRC. Every node voted mostly for itself.

spmurrayzzz · on Sept 29, 2024

Ah, interesting. That sort of split voting is indeed very bad luck, potentially a config-specific issue, or just a cluster that's seeing a catastrophic partition failure between every node.

In canonical Raft assuming no partition failures, this could only happen if every node's election timeout triggered at roughly the same time and they all become candidates simultaneously. For this state to persist (assuming short election timeouts and short heartbeat intervals), you have to get _really_ unlucky.

In terms of probabilistic likelihood though, this is about as likely as the live-lock issue in Paxos in which multiple proposals with differing proposal ids are made at the same time. You'd seem a similar delay in consensus in that scenario as well. Obviously MultiPaxos handles this with a separate leadership algorithm which makes that outcome much less likely, but the same types of strategies common in those systems to mitigate contention issues can be used in Raft as well (randomized backoffs for example).

withinboredom · on Sept 29, 2024

Yeah, IIRC, we updated the configuration some. I don't remember what specifically, but now that you mention short timeouts, I vaguely remember that coming up as a problem.

ivankelly · on Sept 27, 2024

100% agree. I haven't read the raft paper in years, but I remember thinking there's just too much stuff in there. That stuff in important, but if you want people to understand what's happening they internalize the fundamental idea of being able to block other writers by bumping a number. Which is all covered in the single decree paxos section in part time parliment.

kfrzcode · on Sept 27, 2024

Paxos is nice, sure, but Hedera does DLT with aBFT and much more efficient, as well as being faster and ensuring fairness. It's leaderless, and achieves incredible TPS (10k+ in practice, 100k+ in theory).

I am curious on your thoughts here.