Abstract:It is challenging for complex urban transportation networks to recommend taxi waiting spots for mobile passengers because the traditional centralized mining platform cannot address the storage and calculation problems of GPS trajectory big data, and especially the boundary identification of DBSCAN is difficult on the Spark parallel processing framework. To this end, we propose a parallel DBSCAN optimization algorithm with the silhouette coefficient and the pickup rate on Spark in this paper, named SP-DBSCAN, where users merely input one parameter to complete the distributed recommendation of the best waiting spot. Specifically, under the Hadoop distributed computing platform, a general framework of distributed modeling for waiting spot recommendation on Spark is developed to solve the distributed storage and parallel computing issues of the serial algorithm in handling data partition and clustering of large-scale traffic data on a single machine. Moreover, we put forward a parallel SP-DBSCAN algorithm on Spark to recommend the best waiting spot for passengers, where the traditional DBSCAN algorithm is optimized via the silhouette coefficient and the boarding ratio to address the parameter sensitive problem and the issue of the center of the non-convex clustering graph is solved by giving one cluster with two centroids in the clustering hotspot areas. Finally, experimental results on four groups of real-world taxi GPS trajectory data sets demonstrate that compared with C-DBSCAN and P-DBSCAN, the recognition rate of SP-DBSCAN is increased by 1.6%, 6.2%, 3.47%, and 5.8%, respectively. The empirical study indicates that the clustering region generated by our SP-DBSCAN algorithm can satisfy the requirements that passengers can ride in the hotspot area when they have not successfully hitchhiked at a specific location and turned to the next spot randomly.

Design and Implementation of Parallel DBSCAN Algorithm Based on Spark

A Parallel DBSCAN Algorithm Based on Spark

Research On The Parallelization Of The Dbscan Clustering Algorithm For Spatial Data Mining Based On The Spark Platform

A Parallel Adaptive DBSCAN Algorithm Based on k-Dimensional Tree Partition

Parallel spectral clustering algorithm

An Improvement Method of DBSCAN Algorithm on Cloud Computing

A Parallel Graph Data Analysis System Based on Spark *

A Parallel Varied Density-Based Clustering Algorithm with Optimized Data Partition

DBSCAN-PSM: an Improvement Method of DBSCAN Algorithm on Spark

DBSCAN Optimization Algorithm Based on KD-tree Partitioning inCloud Computing

The parallel algorithms for LIBSVM parameter optimization based on Spark

An Improved K-means Distributed Clustering Algorithm Based on Spark Parallel Computing Framework

An Effective High-Performance Multiway Spatial Join Algorithm with Spark

A MapReduce-based improvement algorithm for DBSCAN:

Data Mining Algorithm for Cloud Network Information Based on Artificial Intelligence Decision Mechanism

Parallelization of Classification Algorithms Based on SparkR

Memory optimization of Spark parallel computing framework

A Parallel SP-DBSCAN Algorithm on Spark for Waiting Spot Recommendation.

Optimization of collaborative filtering algorithm based on DAG Spark scheduling

Data balancing-based intermediate data partitioning and check point-based cache recovery in Spark environment

Study of ELM Algorithm Parallelization Based on Spark