A performance modeling framework for microservices-based cloud infrastructures

Thiago Felipe da Silva Pinheiro,Paulo Pereira,Bruno Silva,Paulo Maciel
DOI: https://doi.org/10.1007/s11227-022-04967-6
2022-12-08
Abstract:Microservice architectures (MSAs) can increase the performance of distributed systems and enable better resource allocation by sharing underlying resources among multiple microservices (MSs). One of the main advantages of MSAs is the ability to leverage the elasticity provided by an infrastructure so that only the most demanding services are scaled, which can contribute to efficient allocation of processing resources. A major problem in allocating resources to microservices is determining a set of auto-scaling parameters that will result in all microservices meeting specific service level agreements (SLAs). Since the space of feasible configurations can be vast, manually determining a combination of parameter values that will result in all SLAs being met is complex and time consuming. In addition, the performance overhead caused by running microservices concurrently and the overhead caused by the VM instantiation process must also be evaluated. Another problem is that microservices can suffer performance degradation due to resource contention, which depends on how microservices are distributed across servers. To address the aforementioned issues, this paper proposes the modeling of these infrastructures and their auto-scaling mechanisms in a private cloud using stochastic Petri nets (SPNs), the non-dominated sorting genetic algorithm II (NSGA-II), one of the most popular evolutionary algorithms for multiobjective optimization (MOO), and random forest regression (RFR), an ensemble-learning-based method, to identify critical trade-offs between performance and resource consumption considering all deployed MSs. The SPN-based model is capable of representing both instantiation of elastic VMs and a pool of instantiated elastic VMs where only containers are started. The analytical framework enables service providers (SPs) to estimate performance metrics considering configurations that satisfy all performance constraints, use of elastic VMs, discard rate, discard probability, throughput, response time, and corresponding cumulative distribution functions (CDFs). These metrics are critical because they make it possible to estimate the time required to process each request, the number of requests processed in a time interval, the number of requests rejected, and the utilization of resources. The framework was validated with 95% confidence interval (CI) using a real-world testbed. Two case studies were used to investigate its feasibility by evaluating its application in a real scenario. We noticed a significant improvement in performance when using a pool of elastic VMs, where throughput improved by 21.5% and the number of discarded requests decreased by 70%. The application of the framework can help in finding optimized solutions that support both infrastructure planning and online performance prediction, and enable trade-off analyses considering different scenarios and constraints.
computer science, theory & methods,engineering, electrical & electronic, hardware & architecture
What problem does this paper attempt to address?