Model-based reinforcement learning for service mesh fault resiliency in a web application-level

Fanfei Meng,Lalita Jagadeesan,Marina Thottan
DOI: https://doi.org/10.54254/2755-2721/43/20230817
2024-02-26
Abstract:Microservice-based architectures enable different aspects of applications to be created and updated independently, even after deployment. Associated technologies such as service mesh provide fault resiliency through attribute configurations that govern self-adaptive application-level behavior in response to failures, in a manner transparent to the application and constituent microservices. While this provides tremendous flexibility, the configured values of these attributes and the relationships among them can significantly affect the performance and fault resilience of the overall application. It is thus important to perform fault injection and load testing on the application, prior to full deployment. However, given a large number of possible attribute combinations and the complexities of the distributed system underlying microservices and service mesh architectures, it is virtually impossible to determine through traditional software development practices the worst combinations of attribute values and load settings with respect to self-adaptive application-level fault resiliency. To this end, we present a model-based reinforcement learning approach that determines the combinations of attribute and load settings that result in the most significant fault resilience behaviors at an application level. We validate our approach through a case study on a simple request-response service using the Istio service mesh. Our analysis shows that, even for a simple service, our model-based reinforcement learning approach outperforms a baseline selection of action parameters. Further, we show that communicative multi-agent reinforcement learning improves the performance of both the non-communicative single and multi-agent learning paradigms.
What problem does this paper attempt to address?