Abstract:R-tree is a popular index which supports efficient queries on multi-dimensional data. The performance of R-tree mostly depends on how the tree structure is built if new data instances are inserted, which has been studied for years. Existing works can be categorized into two groups. One is the bulk-loading approaches that insert data instances in batch, but they cannot support real-time insertion. Hence, our focus is on the other one that inserts each data instance individually, and thus fresh data can be instantly queried. However, existing methods do not consider the workload information, which leads to limited potential optimization opportunity. Therefore, it is important to study workload-aware R-tree construction for efficient multi-dimensional data access. There are several challenges. First, how to represent the query workload is a challenge. Second, given a workload, it is challenging to accurately measure the benefit of a data insertion choice. Third, both range queries and kNN queries should be considered in the workload. To address these challenges, we propose a novel framework that leverages a learning-based method to solve the workload-aware R-tree construction problem. First, by extracting the query workload features, we learn a distribution for the workload using the space partition. Second, considering the distribution, we design a cost model to describe the benefits (i.e., query execution time) of different insertion choices and select the best one. Third, we convert the kNN queries to range search ones, so as to support the workload including both types of queries. Experimental results show that on OpenStreetMap real datasets, compared with baselines, we improve the query efficiency by 1.17x.

An Observation Dimension Weight-Based U-Tree Algorithm

No-Fringe U-Tree: An Optimized Algorithm for Reinforcement Learning

The Hierarchical Degree-of-visibility Tree

Tree Based Discretization for Continuous State Space Reinforcement Learning

Weighted Oblique Decision Trees

Sparse tree search optimality guarantees in POMDPs with continuous observation spaces

A Search Space Utility Optimization Based Online POMDP Planning Algorithm

RGMDT: Return-Gap-Minimizing Decision Tree Extraction in Non-Euclidean Metric Space

An Optimal Computing Budget Allocation Tree Policy for Monte Carlo Tree Search

FoLDTree: A ULDA-Based Decision Tree Framework for Efficient Oblique Splits and Feature Selection

Observation-Based Optimization for POMDPs with Continuous State, Observation, and Action Spaces.

Detecting Occluded and Dense Trees in Urban Terrestrial Views With a High-Quality Tree Detection Dataset

An improved decision tree algorithm based on boundary mixed attribute dependency

Learning Ultrametric Trees for Optimal Transport Regression

Exploring search space trees using an adapted version of Monte Carlo tree search for combinatorial optimization problems

RW-Tree: A Learned Workload-aware Framework for R-tree Construction

BPP-Search: Enhancing Tree of Thought Reasoning for Mathematical Modeling Problem Solving

Quantitative Association Analysis Using Tree Hierarchies

Dimension Projection Matrix/Tree: Interactive Subspace Visual Exploration and Analysis of High Dimensional Data

Optimally Ordered Orthogonal Neighbor Joining Trees for Hierarchical Cluster Analysis

Improved Distance Sensitivity Oracles Via Tree Partitioning