Nowadays, software companies make a lot of effort to ensure the reliability of their systems and services. Carrying out functional and non-functional tests is an integral part of the entire software development process. Resilience testing is a section of the second group which ensures that applications perform well in real-life conditions.
One way of improving the resilience of software is by hosting it on cloud servers. However, it is not enough because the system failures still occur. The conclusion is that the best defense against unexpected failures is to fail often. Such an approach can be adopted by using Chaos Monkey. It follows the Principles of Chaos Engineering.
Why should we use it? Chaos Monkey can help us to verify whether our fallbacks are properly defined, and network latency and service breakdowns do not negatively impact our system. We should run Chaos Monkey in our staging environment and monitor how our system behaves. It would be good to simulate high traffic by load tests. If the metrics confirm that we can control the chaos, then there is no limitation to run Chaos Monkey also in the production environment. Such experiments will make us sure that nothing will surprise us in the future.
In this article we will aim on a dedicated implementation of Chaos Monkey for Spring Boot (CM4SP) applications which is maintained by Codecentric.
Let’s meet Monkeys that make chaos
Chaos Monkey consists of Watchers and Assaults. The graph shows how they are dependent.
Watcheris a Spring component that establishes Aspect for applying additional logic. There is a dedicated watcher created for each of the supported Spring annotations. It can proceed with assaults on all public methods in corresponding classes.
Supported Spring annotations:
- @Controller
- @RestController
- @Service
- @Repository
- @Component
Assaultis one of the available attack types conducted by the watcher. The following assaults are possible:
- Latency Assault
- Exception Assault
- AppKiller Assault
- Memory Assault
How does it work?
Chaos Monkey is initialised
by activating the chaos-monkey
application profile. Then Spring Boot loads Chaos Monkey
configuration and creates needed beans. Public classes are scanned to find these that
contain one of the supported annotations. Then watchers can carry out assaults on them.
The set of liable components can be limited by setting configuration property to
explicitly indicate classes we want to attack. There are 4 possible types of attacks. At
the same time, more than one can be active, but some configuration properties are common.
The process of making chaos can be monitored by dedicated metrics.
Integration
We can integrate Chaos Monkey into our Spring Boot application in a simple way. No code changes are needed. The only things we should do are:
- add
chaos-monkey-spring-boot
dependency:
de.codecentric chaos-monkey-spring-boot 2.2.0
initialise Chaos Monkey by
activating spring profile chaos-monkey
:
spring: profiles: active: chaos-monkey
There is only one
requirement. Our project must contain spring-web dependency (spring-boot-starter-web
or spring-boot-starter-webflux
).
Chaos Monkey is disabled by default, so we can keep calm. No unexpected behaviour will surprise us until we don’t change it in the configuration.
Configuration
Chaos Monkey gives us the possibility to set the configuration in two different ways:
- statically in
configuration file (
application.yml
orapplication.properties
), - at runtime through exposed endpoint
/actuator/chaosmonkey
.
Both can be mixed. We can set up all possible properties provided by CM4SB statically in the configuration file. However, it would be a good approach only in small, individual cases. Whereas configuration at runtime has one limitation – we are unable to change the status of active/inactive watchers.
The best approach is to define watchers statically and all other properties later at runtime. Furthermore, Chaos Monkey should not be enabled in the configuration file (just leave it as default – disabled).
The minimal required configuration is to define which Spring components should be attacked by Chaos Monkey. By default, the only service watcher is enabled. If we do not want to have our service components attacked, then we have to explicitly disable it. Setting flags for all other watchers is not required. In the below chunk of configuration, we set them explicitly just to show possible configuration parameters.
1 2 3
chaos: monkey: assaults: latencyActive: false watcher: restController: true controller: false service: false #true by default repository: false component: false
There is possibility to
limit Chaos Monkey’s area of destruction. To achieve it we can set watchedCustomServices
property
to choose only the services that we want to attack. It is a list of public class and
method names. Suitable watchermust be enabled, otherwise provided
components will not be accessible.
1 2 3 4
"watchedCustomServices": [ "com.softwarehut.office.controller.OfficeController.open", "com.softwarehut.office.controller.OfficeController.close", "com.softwarehut.office.controller.CompanyController" ]
Runtime configuration
Chaos Monkey offers built-in endpoints exposed via HTTP or JMX that allow us to change the configuration at runtime. We chose HTTP. To enable it, a configuration similar to the below is required:
1 2
management: endpoint: chaosmonkey: enabled: true endpoints: web: exposure: include: chaosmonkey
Exposed endpoints allow to:
get current Chaos Monkey configuration
GET: /chaosmonkey
check status whether Chaos Monkey is enabled or disabled,
GET: /chaosmonkey/status
enable/disable Chaos Monkey,
1 2
POST: /chaosmonkey/enable POST: /chaosmonkey/disable
get statuses of watchers,
GET: /chaosmonkey/watchers
get and modify configuration of assaults
1 2
GET: /chaosmonkey/assaults POST: /chaosmonkey/assaults
Carry out the assaults
Chaos Monkey for Spring Boot allows us to conduct 4 different types of attacks. They can be grouped by application context or type of activation.
In the first group, we will place Latency Assault and Exception Assault. Both of them depend on HTTP requests. To set the frequency of occurrence we specify the level. For example, level=3 determines that every third application request will be attacked by Chaos Monkey.
"level": 3,
To the second group, we will classify Memory Assault and KillApp Assault. They are not dependent on requests. For their activation, you can apply a scheduler.
Applicable cron expression can be set as a value of runtimeAssaultCronExpression configuration parameter.
1 2
"runtimeAssaultCronExpression": "*/1 * * * * ?"
Latency Assault
The latency range is limited
by 2 parameters: latencyRangeStart
and latencyRangeEnd
. Their values are validated and can’t
be lower than 1 and higher that in java.lang.Integer.MAX_VALUE
. Time unit is millisecond
[ms].
1 2 3 4
POST: /chaosmonkey/assaults { "level": 3, "latencyActive": true, "latencyRangeStart": 2000, "latencyRangeEnd": 9000, # disable other assaults "exceptionsActive": false, "killApplicationActive": false, "restartApplicationActive": false }
Exception Assault
Chaos Monkey allows setting custom exception that will be thrown. To apply any exception, its class name with the package has to be passed as type and arguments of its constructor in arguments array.
1 2 3 4 5
POST: /chaosmonkey/assaults { "level": 3, "exceptionsActive": true, "exception": { "type": "java.lang.RuntimeException", "arguments": [ { "className": "java.lang.String", "value": "Exception assault has been carried out" } ] }, # disable other assaults "latencyActive": false, "killApplicationActive": false, "restartApplicationActive": false }
Memory Assault
It attacks the memory of the Java Virtual Machine. For more configuration details take a look at the official documentation.
1 2 3 4 5
POST: /chaosmonkey/assaults { "memoryActive": true, "memoryMillisecondsHoldFilledMemory": 90000, "memoryMillisecondsWaitNextIncrease": 100, "memoryFillIncrementFraction": 0.90, "memoryFillTargetFraction": 0.95, "runtimeAssaultCronExpression": "*/1 * * * * ?", # disable other assaults "latencyActive": false, "exceptionsActive": false, "killApplicationActive": false }
KillApp Assault
This one is simple, it just
closes an application according to the scheduler. It is run only when scheduler is defined
– cron expression in runtimeAssaultCronExpression
configuration parameter.
1 2 3
POST: /chaosmonkey/assaults { "killApplicationActive": true, "runtimeAssaultCronExpression": "*/1 * * * * ?", # disable other assaults "latencyActive": false, "exceptionsActive": false, "memoryActive": true, }
Monitoring
Chaos Monkey for Spring Boot has no built-in generating of reports. It provides metrics that we can use to visualize effects relying on independent tools. We used:
- Micrometerfor exposing metrics in a Prometheus format,
- Prometheusfor gathering metrics,
- Grafanafor visualizing application state during carried out assaults.
Testing
To observe the work of Chaos Monkey we generated some traffic. For this purpose, we have used Gatling. It allows us to simulate multiple users that make requests to our application at the same time. Every Gatling test scenario looks as follow:
- enable Chaos Monkey,
- enable and configure assault type,
- make a bunch of requests to the application’s API endpoints,
- disable Chaos Monkey.
Micrometer
In order to expose suitable
metrics , we used micrometer-registry-prometheus
and spring-boot-starter-actuator
. Versions are
inherited indirectly from Spring Boot Starter Parent.
1 2
io.micrometer micrometer-registry-prometheus org.springframework.boot spring-boot-starter-actuator
Prometheus
Prometheus instance gathers metrics from all microservices with integrated Chaos Monkey library. It is used as a data source in Grafana.
Grafana
A Grafana dashboard id=9845 has been used. Some improvements were made. More readable labels and dedicated panels for Chaos Monkey watchers.
More configuration details and examples can be found over on GitHub.
Conclusion
Chaos Monkey for Spring Boot is a mature tool for conducting resilience tests. The main advantage is the simplicity of applying it in existing systems. It is dedicated to Spring Boot applications. No code modifications are needed. A brief configuration is enough. All other settings can be changed at runtime; it makes it highly flexible.
Chaos Monkey provides 4 different assault types that can be carried out at any time during the runtime. The process of interfering is recorded by using exposed metrics. There is no built-in reporting functionality. However, those metrics can be consumed and visualised in external tools.
Explore more stories like this on our Tech Blog