A Parallel SP-DBSCAN Algorithm on Spark for Waiting Spot Recommendation.

Xia Dawen,Bai Yu,Zheng Yongling,Hu Yang,Li Yantao,Li Huaqing
DOI: https://doi.org/10.1007/s11042-021-11639-9
IF: 2.577
2021-01-01
Multimedia Tools and Applications
Abstract:It is challenging for complex urban transportation networks to recommend taxi waiting spots for mobile passengers because the traditional centralized mining platform cannot address the storage and calculation problems of GPS trajectory big data, and especially the boundary identification of DBSCAN is difficult on the Spark parallel processing framework. To this end, we propose a parallel DBSCAN optimization algorithm with the silhouette coefficient and the pickup rate on Spark in this paper, named SP-DBSCAN, where users merely input one parameter to complete the distributed recommendation of the best waiting spot. Specifically, under the Hadoop distributed computing platform, a general framework of distributed modeling for waiting spot recommendation on Spark is developed to solve the distributed storage and parallel computing issues of the serial algorithm in handling data partition and clustering of large-scale traffic data on a single machine. Moreover, we put forward a parallel SP-DBSCAN algorithm on Spark to recommend the best waiting spot for passengers, where the traditional DBSCAN algorithm is optimized via the silhouette coefficient and the boarding ratio to address the parameter sensitive problem and the issue of the center of the non-convex clustering graph is solved by giving one cluster with two centroids in the clustering hotspot areas. Finally, experimental results on four groups of real-world taxi GPS trajectory data sets demonstrate that compared with C-DBSCAN and P-DBSCAN, the recognition rate of SP-DBSCAN is increased by 1.6%, 6.2%, 3.47%, and 5.8%, respectively. The empirical study indicates that the clustering region generated by our SP-DBSCAN algorithm can satisfy the requirements that passengers can ride in the hotspot area when they have not successfully hitchhiked at a specific location and turned to the next spot randomly.
What problem does this paper attempt to address?