Packing R-trees with Space-filling Curves: Theoretical Optimality, Empirical Efficiency, and Bulk-loading Parallelizability

Jianzhong Qi,Yufei Tao,Yanchuan Chang,Rui Zhang
DOI: https://doi.org/10.1145/3397506
IF: 1.6289
2020-01-01
ACM Transactions on Database Systems
Abstract:The massive amount of data and large variety of data distributions in the big data era call for access methods that are efficient in both query processing and index management, and over both practical and worst-case workloads. To address this need, we revisit two classic multidimensional access methods-the R-tree and the space-filling curve. We propose a novel R-tree packing strategy based on space-filling curves. This strategy produces R-trees with an asymptotically optimal I/O complexity for window queries in the worst case. Experiments show that our R-trees are highly efficient in querying both real and synthetic data of different distributions. The proposed strategy is also simple to parallelize, since it relies only on sorting. We propose a parallel algorithm for R-tree bulk-loading based on the proposed packing strategy and analyze its performance under the massively parallel communication model. To handle dynamic data updates, we further propose index update algorithms that process data insertions and deletions without compromising the optimal query I/O complexity. Experimental results confirm the effectiveness and efficiency of the proposed R-tree bulk-loading and updating algorithms over large data sets.
What problem does this paper attempt to address?