SWISP: Distributed Convoy Mining Via Sliding Window-based Indexing and Sub-track Partitioning

Chenxu Wang,Xin Yang,Tianyi Li,Jiaxing Wei,Pinghui Wang,Hongzhen Xiang,Christian S. Jensen
DOI: https://doi.org/10.1109/icde60146.2024.00344
2024-01-01
Abstract:With the widespread deployment of location-aware mobile devices, a mass of trajectory data is being generated and collected. Mining co-movement patterns of people and vehicles from streaming and massive trajectory data has attracted much attention due to its wide applications in various fields. As a typical co-movement pattern, convoys describe objects moving together in consecutive timestamps. There are two challenges for efficient distributed convoy mining: object clustering and workload balancing. Clustering objects in each time snapshot is a time-consuming operation. In addition, on the basis of practical application scenarios, load balancing is an important consideration for distributed algorithms. To tackle the above challenges, we propose a novel method for distributed convoy mining via sliding window-based indexing and sub-track partitioning, abbreviated SWISP. We offer three major advancements. First, we develop a grid-based DBSCAN clustering algorithm named Grid-DBSCAN for distributed scenarios. It avoids the exhaustive calculation of pairwise distances for neighborhood search and thus improves computational efficiency in the clustering stage. Second, we propose a sliding window-based indexing scheme to filter out sub-tracks with less than $k$ consecutive time snapshots, significantly reducing the number of candidate sub-tracks for convoy mining. Third, we develop a distributed convoy mining algorithm based on sub-track partitioning. It exploits both temporal and spatial information of sub-tracks for data partitioning and solves the data skewness problem caused by uneven data distributions. We conduct extensive experiments on four real-world datasets. The experimental results show that our distributed algorithm can handle large-scale trajectory data and is more efficient than state-of-the-art approaches.
What problem does this paper attempt to address?