EvoGWP: Predicting Long-Term Changes in Cloud Workloads Using Deep Graph-Evolution Learning

Jialun Li,Jieqian Yao,Danyang Xiao,Diying Yang,Weigang Wu
DOI: https://doi.org/10.1109/tpds.2024.3357715
IF: 5.3
2024-02-14
IEEE Transactions on Parallel and Distributed Systems
Abstract:Workload prediction plays a crucial role in resource management of large scale cloud datacenters. Although quite a number of methods/algorithms have been proposed, long-term changes have not been explicitly identified and considered. Due to shifty user demands, workload re-locations, or other reasons, the "resource usage pattern" of a workload, which is usually quite stable in a short-term view, may change dynamically in a long-term range. Such long-term dynamic changes may cause significant accuracy degradation for prediction algorithms. How to handle such long-term dynamic changes is an open and challenging issue. In this article, we propose Evolution Graph for Workload Prediction (EvoGWP), a novel method that can predict long-term dynamic changes using a delicately designed graph-based evolution learning algorithm. EvoGWP automatically extracts shapelets to explicitly identify resource usage patterns of workloads in a fine-grained level, and predicts workload changes by considering factors in both temporal and spatial dimensions. We design a two-level importance based shapelet extraction mechanism to mine new usage pattern changes in temporal dimension, and design a novel evolution graph model to fuse the interference among resource usage patterns of different workloads in spatial dimension. By combining temporal extraction of shapelets from each single workload and spatial interference of shapelets among different workloads, we then design a spatio-temporal GNN-based encoder-decoder model to predict the long-term dynamic changes of workloads. Experiments using real trace data from Alibaba, Tencent and Google show that EvoGWP improves the prediction accuracy by up to 58.6% over the state-of-the-art prediction methods. Moreover, EvoGWP can outperform the state-of-the-art prediction methods in terms of model convergence. To the best of our knowledge, this is the first work that explicitly identifies fine-grained workload resource usage patterns to accurately predict long-term dynamic changes of workloads.
computer science, theory & methods,engineering, electrical & electronic
What problem does this paper attempt to address?