Abstract:It is challenging for complex urban transportation networks to recommend taxi waiting spots for mobile passengers because the traditional centralized mining platform cannot address the storage and calculation problems of GPS trajectory big data, and especially the boundary identification of DBSCAN is difficult on the Spark parallel processing framework. To this end, we propose a parallel DBSCAN optimization algorithm with the silhouette coefficient and the pickup rate on Spark in this paper, named SP-DBSCAN, where users merely input one parameter to complete the distributed recommendation of the best waiting spot. Specifically, under the Hadoop distributed computing platform, a general framework of distributed modeling for waiting spot recommendation on Spark is developed to solve the distributed storage and parallel computing issues of the serial algorithm in handling data partition and clustering of large-scale traffic data on a single machine. Moreover, we put forward a parallel SP-DBSCAN algorithm on Spark to recommend the best waiting spot for passengers, where the traditional DBSCAN algorithm is optimized via the silhouette coefficient and the boarding ratio to address the parameter sensitive problem and the issue of the center of the non-convex clustering graph is solved by giving one cluster with two centroids in the clustering hotspot areas. Finally, experimental results on four groups of real-world taxi GPS trajectory data sets demonstrate that compared with C-DBSCAN and P-DBSCAN, the recognition rate of SP-DBSCAN is increased by 1.6%, 6.2%, 3.47%, and 5.8%, respectively. The empirical study indicates that the clustering region generated by our SP-DBSCAN algorithm can satisfy the requirements that passengers can ride in the hotspot area when they have not successfully hitchhiked at a specific location and turned to the next spot randomly.

A Parallel DBSCAN Algorithm Based on Spark

A Parallel Adaptive DBSCAN Algorithm Based on k-Dimensional Tree Partition

Research On The Parallelization Of The Dbscan Clustering Algorithm For Spatial Data Mining Based On The Spark Platform

Design and Implementation of Parallel DBSCAN Algorithm Based on Spark

A Parallel Varied Density-Based Clustering Algorithm with Optimized Data Partition

DBSCAN-PSM: an Improvement Method of DBSCAN Algorithm on Spark

An Improvement Method of DBSCAN Algorithm on Cloud Computing

DBSCAN Optimization Algorithm Based on KD-tree Partitioning inCloud Computing

A MapReduce-based improvement algorithm for DBSCAN:

Parallel Algorithm for Discovering Communities in Large-Scale Complex Networks

A Parallel SP-DBSCAN Algorithm on Spark for Waiting Spot Recommendation.

An Improved K-means Distributed Clustering Algorithm Based on Spark Parallel Computing Framework

An Effective High-Performance Multiway Spatial Join Algorithm with Spark

RT-DBSCAN: Real-Time Parallel Clustering of Spatio-Temporal Data Using Spark-Streaming

Data Mining Algorithm for Cloud Network Information Based on Artificial Intelligence Decision Mechanism

A New Design of High-Performance Large-Scale GIS Computing at a Finer Spatial Granularity: A Case Study of Spatial Join with Spark for Sustainability

Parallel Division Clustering Algorithm Based on Spark Framework and ASPSO

A Parallel Graph Data Analysis System Based on Spark *

PS-DBSCAN: An Efficient Parallel DBSCAN Algorithm Based on Platform Of AI (PAI)

Approaches for Scaling Dbscan Algorithm to Large Spatial Databases

A Survey and Experimental Review on Data Distribution Strategies for Parallel Spatial Clustering Algorithms