What problem does this paper attempt to address?

The problem that this paper attempts to solve is the complexity and challenges of resource management in the microservice architecture in the cloud - computing environment. Specifically, the paper points out: 1. **Large action space**: Due to the frequent changes in application behavior, resource management decisions need to be made online. This means that the resource manager must traverse the space containing all possible resource allocations for each microservice in a practical way. Suppose there are \(N\) microservice levels and a pool containing \(C\) (\(C \geq N\)) homogeneous physical cores, and each core has \(F\) frequency levels, then the size of the action space is \(\binom{C - 1}{N - 1}\cdot N^F\). For example, in a cluster with 150 cores, assuming that each level has 10 frequency steps, the size of the resource allocation space for social network applications is \(7.78\times 10^{55}\). Performing performance evaluations on the configurations of all actions under different loads requires a large amount of time and computing resources. Therefore, there is an urgent need for efficient action - space pruning methods and statistical tools with strong generalization capabilities to support resource scheduling. 2. **Queuing effect of latency**: In a queuing system with a processing throughput \(T_o\) and a latency Quality of Service (QoS) target \(Q\), \(T_o\) is a non - decreasing function of the allocated resource \(R\). To meet the QoS and keep the system stable while using the least amount of resource \(R\), the input load \(T_i\) should be equal to or slightly higher than \(T_o\). Even if \(R\) is reduced to \(T_o < T_i\), the QoS will not be violated immediately because it takes time for the queue to accumulate. Conversely, when the QoS is violated, even if resources are increased immediately, it takes a long time for the established queue to be drained. Multilevel microservices are a complex queuing system, and queues exist between and within microservices. This queuing effect of latency emphasizes that the machine - learning model needs to evaluate the long - term impact of resource management actions and proactively prevent the resource manager from reducing resources too aggressively to avoid introducing a long recovery period. To avoid QoS violations, the manager must increase resources in advance; otherwise, even if more resources are allocated subsequently, QoS violations are inevitable. 3. **Inter - level dependencies**: Another complex factor in microservice resource management is that dependent microservices are not perfect pipelines, so back - pressure effects that are difficult to detect and prevent may be introduced. These dependencies may be further exacerbated by specific Remote Procedure Call (RPC) and data - storage API implementations. Therefore, the resource scheduler should have a global perspective and be able to predict the impact of dependencies on end - to - end performance. To solve these problems, the paper proposes Sinan, a machine - learning - based cluster manager, which aims to infer the impact of resource allocation on end - to - end performance by using trace data in the cloud and a series of practical machine - learning techniques, and allocate appropriate resources to each application layer. Sinan adopts a hybrid approach, using a Convolutional Neural Network (CNN) to predict the end - to - end latency in the next decision interval and a Boosted Trees model to predict the probability of QoS violations in the more distant future. This method not only improves resource efficiency but also ensures service quality.

Sinan: Data Driven Resource Management for Cloud Microservices

Microservice Auto-Scaling Algorithm Based on Workload Prediction in Cloud-Edge Collaboration Environment

Analytically-Driven Resource Management for Cloud-Native Microservices

uqSim: Scalable and Validated Simulation of Cloud Microservices

Sonnet: A control-theoretic approach for resource allocation in cluster management

The Architectural Implications of Microservices in the Cloud

Intelligent Management of Virtualized Resources for Database Systems in Cloud Environment

Topology-Aware Scheduling Framework for Microservice Applications in Cloud

SRAF: A Service-Aware Resource Allocation Framework for VM Management in Mobile Data Networks

Adaptive QoS-aware Microservice Deployment with Excessive Loads Via Intra- and Inter-Datacenter Scheduling

Seer: Leveraging Big Data to Navigate the Complexity of Performance Debugging in Cloud Microservices.

Nodens: Enabling Resource Efficient and Fast QoS Recovery of Dynamic Microservice Applications in Datacenters

Adaptive Resource Efficient Microservice Deployment in Cloud-Edge Continuum

SimMon: a toolkit for simulation of monitoring mechanisms in cloud computing environment.

Practice of Alibaba Cloud on Elastic Resource Provisioning for Large-scale Microservices Cluster

QoS-awareness of Microservices with Excessive Loads Via Inter-Datacenter Scheduling

An On-Site Elastic Autonomous Service Network with Efficient Task Assignment

Seer: leveraging big data to navigate the complexity of cloud debugging

SimMon: A Toolkit for Simulating Monitoring Mechanism in Cloud Computing Environments.

Towards Cost Efficient Mobile Service and Information Management in Ubiquitous Environment with Cloud Resource Scheduling

Scalable Application-Aware Resource Management In Software Defined Networking