Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How do you square that with the fact that shit usually hits the fan precisely because of this complexity, not in spite of it? That's my observation & experience, anyway.

Added bits of "resiliency" often add brand new, unexplored failure points that are just ticking time bombs waiting to bring the entire system down.



Not adding that resiliency isn't the answer though - it just means known failures will get you. Is that better than the unknown failures because of your mitigation? I cannot answer that.

I can tell you 100% that eventually a disk will fail. I can tell you 100% that eventually the power will go out. I can tell you 100% that even if you have a computer with redundant power supplies each connected to separate grids, eventually both power supplies will fail at the same time - it just will happen a lot less often than if you have a regular computer not on any redundant/backup power. I can tell you that network cables do break from time to time. I can tell you that buildings are vulnerable to earthquakes, fires, floods, tornadoes and other such disasters). I can tell you that software is not perfect and eventually crashes. I can tell you that upgrades are hard if any protocol changed. I can tell you there is a long list of other known disasters that I didn't list, but a little research will discover.

I could look up the odds of the above. In turn this allows calculating the costs of each mitigation against the likely cost of not mitigating it - but this is only statistical you may decide something statistically cannot happen and it does anyway.

What I cannot tell you is how much you should mitigate. There is a cost to each mitigation that need to be compared to the value.


Absolutely yeah, these things are hard enough to test in a controlled environment with a single app (e.g. FoundationDB) but practically impossible to test fully in a microservices architecture. It's so nice to have this complexity managed for you in the storage layer.


Microservices almost always increase the amount of partial failures, but if used properly can reduce the amount of critical failures.

You can certainly misapply the architecture, but you can also apply it well. It's unsurprising that most people make bad choices in a difficult domain.


Fault tolerance doesn't necessarily require microservices (as in separate code bases) though, see Erlang. Or even something like Unison.

But for some reason it seems that few people are working on making our programming languages and frameworks fault tolerant.


Because path dependence is real so we're mostly building on top of a tower of shit. And as computers got faster, it became more reasonable to have huge amounts of overhead. Same reason that docker exists at all.


> How do you square that with the fact that shit usually hits the fan precisely because of this complexity

The theoretical benefit may not be what most teams are going to experience. Usually the fact that microservices are seen as a solution to a problem that could more easily be solved in other much simpler ways, is a pretty good indication that any theoretical benefits are going to be lost through other poor decision making.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: