It's amazing to me that something I threw out in a meeting six years ago became an entire engineering culture.
It all started when we were talking about what to call the folks who were building things like Chaos Monkey/Gorilla/Kong and I said, "Let's call them Chaos Engineers, since they are engineering chaos". And so we adopted that title at Netflix, and now here we are.
I should also point out here how valueless what I did was -- I literally just came up with the name for what we were already doing inside Netflix. Everyone else actually wrote it down and spread it outside of Netflix. Their contributions to spreading the word are far more important than the name I came up with.
Chaos Engineering is fault injection being incorporated as a core aspect of a production system. So whereas in the 1970s fault injection might have been more used in the development and test phases, Chaos Engineering also includes a bunch of persistent services knocking out key aspects of your production system.
Yes; to expand on that a bit -- you can do fault injection in a chaos experiment. You can also do many other things within an experiment that would be difficult to classify as fault injection; for example, a sharp increase in the number of customers making requests to your service isn't really a fault. Chaos Engineering then wraps all of those techniques in a framework of experimentation and analysis. That's the discipline, which encourages you to proactively make improvements to Availability, and now Security as well.
Naming things is notoriously difficult and you nailed it. Bravo.
I remember clearly the chaos monkey posts and how it resonated for our team, as we were building an ambitious real time project at the time. Thanks for the inspiration booster o/
At my old job, we were creating a new mobile app, and as a joke I suggested a name pretty close to grindr (it was a completely unrelated app). The CEO was oblivious to my sarcasm and decided it was a great name for the app, so that's what we ran with.
This made me laugh. I get shocked looks of disbelief and claims that AWS are basically gods when I tell people that we're losing packets in eu-west-1 occasionally. The only thing we gained is not having to argue with one vendor when that happens, not two.
I mean, you mostly just need blood and skulls, it doesn't need to be fancy - like any gift, it's the thought that counts. Excess bones and bloods can be turned into glue [1] or filler material to increase the structural robustness.
Reminds me of an anecdote about a CTO that pulled the plug on their main production database while giving a tour to investors. He had complete confidence that the solutions they had in place would deal with the problem gracefully.
That's the kind of confidence I'd like to have some day.
Has anybody tried this sort break stuff experiments in financial institutions(large banks). Does the chaos you caused has it affected any customers and made them report
It all started when we were talking about what to call the folks who were building things like Chaos Monkey/Gorilla/Kong and I said, "Let's call them Chaos Engineers, since they are engineering chaos". And so we adopted that title at Netflix, and now here we are.
I should also point out here how valueless what I did was -- I literally just came up with the name for what we were already doing inside Netflix. Everyone else actually wrote it down and spread it outside of Netflix. Their contributions to spreading the word are far more important than the name I came up with.