Meet with Split at AWS re:Invent! Book Now

Glossary

Chaos Engineering

Chaos engineering, also known as chaos testing, provides a method and tool-set to deliberately introduce failures and outages in a system.

In any sufficiently complex software system, failure is inevitable. The chaos engineering approach was pioneered by Netflix, who first created their chaos engineering process in 2010 and posted about it in detail in 2014.

Chaos Engineering and the Simian Army

In chaos engineering, a set of automated processes, known collectively as a “Simian Army”, are used to introduce various types of system failures. The colorful naming of these tools evokes the mental image of chaos testing as a group of monkeys wreaking unexpected havoc in a data center, an event for which engineers must prepare as best they can.

Knowing for sure how a complex system will react to failures is practically impossible. The only way to predict the results of failures — especially catastrophic or cascading failures — is to have them happen. Therefore, creating those failures yourself — in a controlled way and at a time of your choosing — via chaos engineering is a valuable learning exercise.

Understanding the failure modes of your system is particularly important if you have high expectations around reliability, or if you are operating in a less reliable environment — on top of cloud infrastructure, for example. However, injecting chaos requires a certain level of preparedness. You might want to try it out in a pre-production environment first!

Want to Dive Deeper?

We have a lot to explore that can help you understand feature flags. Learn more about benefits, use cases, and real world applications that you can try.

Create Impact With Everything You Build

We’re excited to accompany you on your journey as you build faster, release safer, and launch impactful products.