How do you square that with the fact that shit usually hits the fan precisely be...

bluGill · on Feb 20, 2024

Not adding that resiliency isn't the answer though - it just means known failures will get you. Is that better than the unknown failures because of your mitigation? I cannot answer that.

I can tell you 100% that eventually a disk will fail. I can tell you 100% that eventually the power will go out. I can tell you 100% that even if you have a computer with redundant power supplies each connected to separate grids, eventually both power supplies will fail at the same time - it just will happen a lot less often than if you have a regular computer not on any redundant/backup power. I can tell you that network cables do break from time to time. I can tell you that buildings are vulnerable to earthquakes, fires, floods, tornadoes and other such disasters). I can tell you that software is not perfect and eventually crashes. I can tell you that upgrades are hard if any protocol changed. I can tell you there is a long list of other known disasters that I didn't list, but a little research will discover.

I could look up the odds of the above. In turn this allows calculating the costs of each mitigation against the likely cost of not mitigating it - but this is only statistical you may decide something statistically cannot happen and it does anyway.

What I cannot tell you is how much you should mitigate. There is a cost to each mitigation that need to be compared to the value.

camgunz · on Feb 20, 2024

Absolutely yeah, these things are hard enough to test in a controlled environment with a single app (e.g. FoundationDB) but practically impossible to test fully in a microservices architecture. It's so nice to have this complexity managed for you in the storage layer.

Eridrus · on Feb 20, 2024

Microservices almost always increase the amount of partial failures, but if used properly can reduce the amount of critical failures.

You can certainly misapply the architecture, but you can also apply it well. It's unsurprising that most people make bad choices in a difficult domain.

Tainnor · on Feb 20, 2024

Fault tolerance doesn't necessarily require microservices (as in separate code bases) though, see Erlang. Or even something like Unison.

But for some reason it seems that few people are working on making our programming languages and frameworks fault tolerant.

Eridrus · on Feb 20, 2024

Because path dependence is real so we're mostly building on top of a tower of shit. And as computers got faster, it became more reasonable to have huge amounts of overhead. Same reason that docker exists at all.

troupe · on Feb 20, 2024

> How do you square that with the fact that shit usually hits the fan precisely because of this complexity

The theoretical benefit may not be what most teams are going to experience. Usually the fact that microservices are seen as a solution to a problem that could more easily be solved in other much simpler ways, is a pretty good indication that any theoretical benefits are going to be lost through other poor decision making.