RW-Tree: A Learned Workload-aware Framework for R-tree Construction
Haowen Dong,Chengliang Chai,Yuyu Luo,Jiabin Liu,Jianhua Feng,Chaoqun Zhan
DOI: https://doi.org/10.1109/icde53745.2022.00201
2022-01-01
Abstract:R-tree is a popular index which supports efficient queries on multi-dimensional data. The performance of R-tree mostly depends on how the tree structure is built if new data instances are inserted, which has been studied for years. Existing works can be categorized into two groups. One is the bulk-loading approaches that insert data instances in batch, but they cannot support real-time insertion. Hence, our focus is on the other one that inserts each data instance individually, and thus fresh data can be instantly queried. However, existing methods do not consider the workload information, which leads to limited potential optimization opportunity. Therefore, it is important to study workload-aware R-tree construction for efficient multi-dimensional data access. There are several challenges. First, how to represent the query workload is a challenge. Second, given a workload, it is challenging to accurately measure the benefit of a data insertion choice. Third, both range queries and kNN queries should be considered in the workload. To address these challenges, we propose a novel framework that leverages a learning-based method to solve the workload-aware R-tree construction problem. First, by extracting the query workload features, we learn a distribution for the workload using the space partition. Second, considering the distribution, we design a cost model to describe the benefits (i.e., query execution time) of different insertion choices and select the best one. Third, we convert the kNN queries to range search ones, so as to support the workload including both types of queries. Experimental results show that on OpenStreetMap real datasets, compared with baselines, we improve the query efficiency by 1.17x.