Machine Learning for Databases
Guoliang Li,Xuanhe Zhou,Lei Cao
DOI: https://doi.org/10.14778/3476311.3476405
IF: 2.5
2021-01-01
Proceedings of the VLDB Endowment
Abstract:Machine learning techniques have been proposed to optimize the databases. For example, traditional empirical database optimization techniques (e.g., cost estimation, join order selection, knob tuning, index and view advisor) cannot meet the high-performance requirement for large-scale database instances, various applications and diversified users, especially on the cloud. Fortunately, machine learning based techniques can alleviate this problem by judiciously selecting optimization strategy. In this tutorial, we categorize database tasks into three typical problems that can be optimized by different machine learning models, including NP-hard problems (e.g., knob space exploration, index/view selection, partition-key recommendation for offline optimization; query rewrite, join order selection for online optimization), regression problems (e.g., cost/cardinality estimation, index/view benefit estimation, query latency prediction), and prediction problems (e.g., query workload prediction). We review existing machine learning based techniques to address these problems and provide research challenges.