GLO: Towards Generalized Learned Query Optimization

Tianyi Chen,Jun Gao,Yaofeng Tu,Mo Xu
DOI: https://doi.org/10.1109/icde60146.2024.00368
2024-01-01
Abstract:In recent years, there has been a growing interest in the application of deep reinforcement learning (DRL) techniques on query execution plan generation. Although current DRL-based query optimizers achieve competitive performance against traditional methods on specific query workloads, these methods encounter issues when generalizing to workloads unseen during training. Thus, we propose GLO to address the limitations and step towards generalized learned query optimization. First, rather than using ungeneralizable table-specific one-hot labels in almost all existing work, GLO relies on statistical information of the well-established underlying DBMS along with table patterns extracted via a clustering algorithm, enabling GLO to enhance generalization in different scenarios. Second, GLO improves the information capture of plans by integrating Transformer layers into the DRL value model, empowering the model's capability to handle diverse queries with deeper networks and more parameters in plan generation. In addition, GLO allows the injection of cost estimations from the DBMS as external knowledge for better generalization. Third, GLO recognizes and replaces disastrously poor plans by making comparisons between generated plans and those produced by the DBMS. We establish our experiments on composite workloads that combine various query sets including JOB, Extended JOB, TPC-DS, and Stack. The results demonstrate that GLO outperforms previous state-of-the-art learned optimizers, with a speed 1.4x faster than LOGER and 2.1x faster than Balsa on TPC-DS when TPC-DS queries are completely unknown during training. To the best of our knowledge, GLO is the first learned optimizer that directly generates plans while possessing the preliminary generalization ability across different query workloads.
What problem does this paper attempt to address?