DBAugur: an Adversarial-based Trend Forecasting System for Diversified Workloads.

Yuanning Gao,Xiuqi Huang,Xuanhe Zhou,Xiaofeng Gao,Guoliang Li,Guihai Chen
DOI: https://doi.org/10.1109/icde55515.2023.00385
2023-01-01
Abstract:Trend forecasting is vital to optimize the workload performance. It becomes even more urgent with an increasing number of applications and database configurations. However, DBAs mainly target at historical workloads and may give suboptimal configuration advice when the workload trends have changed. Although there are some studies on trend forecasting, they have several limitations. First, they mainly predict the changes of query numbers, which do not combine other critical factors (e.g., disk utilization) and cannot fully reflect the future workload trends. Besides, there are numerous queries in the workloads and exact clustering algorithms like K-means cannot effectively merge similar queries which contain noises like time shifts. Second, basic machine learning models like RNN may have relatively low prediction accuracy on complex workloads (e.g., no cycles but random bursts). Third, real-world workloads may have diverse patterns, while previous models cannot efficiently and reliably predict for all the different workload patterns.To address these challenges, we propose a trend forecasting system (DBAugur) that utilizes adversarial neural networks to predict the trends of different workloads. First, DBAugur collects the important features (e.g., queries, resource metrics) to characterize workloads, and reduces the number of involved queries by separately merging similar queries based on the SQL semantics and trend patterns. Second, DBAugur utilizes Generative Adversarial Networks (GANs) to capture the latent patterns, correlations between different metrics, and occasional bursts within the complicated and time-varying workloads. Moreover, we further propose a time-sensitive ensemble algorithm that takes advantage of various machine learning models (e.g., generative models, convolutional models, feed-forward models) to accommodate the various workload patterns. The experimental results show that DBAugur outperformed state-of-the-art methods on various real-world workloads.
What problem does this paper attempt to address?