RLO: a Reinforcement Learning-Based Method for Join Optimization

Xinyi ZHANG,Zhipeng ZHANG,Tieying ZHANG,Bin CUI,Ju FAN
DOI: https://doi.org/10.1360/ssi-2019-0179
2020-01-01
Abstract:Join optimization is one of the most important research problems in database systems. Traditional join optimizers are usually proposed based on heuristics, which are expensive and often fail to generate the optimal execution plan. There are two reasons accounting for this. (1) The optimizers are based on heuristics and only explore a subset of the search space. (2) They do not use the history logs and cannot estimate the goodness of their generated plans on a specific join problem. To tackle these challenges, we propose RLO, a reinforcement learning-based optimizer for join optimization. We model the join optimization problem as a Markov decision process and use deep $Q$-learning to estimate the possible reward of a possible operation. To boost the effectiveness of RLO, we further propose a tree-based embedding method to represent the “state and use a beam search to avoid missing the optimal plans. We implement RLO in Apache Calcite and Postgres. Extensive experiments demonstrate that: (1) Apache Calcite RLO is $10~\\times$–$56~\\times$ faster in finding the execution plan and 80% faster in executing the plan than the state-of-the-art heuristics. (2) Compared with the native Postgres implementation, RLO can be $14~\\times$ faster in finding the execution plan and 12.9% faster in an end-to-end comparison.
What problem does this paper attempt to address?