Distributed Spatio-Temporal K Nearest Neighbors Join.
Ruiyuan Li,Rubin Wang,Junwen Liu,Zisheng Yu,Huajun He,Tianfu He,Sijie Ruan,Jie Bao,Chao Chen,Fuqiang Gu,Liang Hong,Yu Zheng
DOI: https://doi.org/10.1145/3474717.3484209
2021-01-01
Abstract:The rapid development of positioning technology produces an extremely large volume of spatio-temporal data with various geometry types such as point, line string, polygon, or a mixed combination of them. As one of the most basic but time-consuming operations, k nearest neighbors join (kNN join) has attracted much attention. However, most existing works for kNN join either ignore temporal information or consider point data only. This paper proposes a novel and useful problem, i.e., ST-kNN join, which considers both spatial closeness and temporal concurrency. To support ST-kNN join over a huge amount of spatio-temporal data with any geometry types efficiently, we propose a novel distributed solution based on Apache Spark. Specifically, our method adopts a two-round join framework. In the first round join, we propose a new spatio-temporal partitioning method that achieves spatio-temporal locality and load balance at the same time. We also propose a lightweight index structure, i.e., Time Range Count Index (TRC-index), to enable efficient ST-kNN join. In the second round join, to reduce the data transmission among different machines, we remove duplicates based on spatio-temporal reference points before shuffling local results. Extensive experiments are conducted using three real big datasets, showing that our method is much more scalable and achieves 9X faster than baselines. A demonstration system is deployed and the source code is released.