Abstract:As a research branch of data mining, clustering, as an unsupervised learning scheme, focuses on assigning objects in the dataset into several groups, called clusters, without any prior knowledge. Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is one of the most widely used clustering algorithms for spatial datasets, which can detect any shapes of clusters and can automatically identify noise points. However, there are several troublesome limitations of DBSCAN: (1) the performance of the algorithm depends on two specified parameters, ε and MinPts in which ε represents the maximum radius of a neighborhood from the observing point and MinPts means the minimum number of data points contained in such a neighborhood. (2) The time consumption for searching the nearest neighbors of each object is intolerable in the cluster expansion. (3) Selecting different starting points results in quite different consequences. (4) DBSCAN is unable to identify adjacent clusters of various densities. In addition to these restrictions about DBSCAN mentioned above, the identification of border points is often ignored. In our paper, we successfully solve the above problems. Firstly, we improve the traditional locality sensitive hashing method to implement fast query of nearest neighbors. Secondly, several definitions are redefined on the basis of the influence space of each object, which takes the nearest neighbors and the reverse nearest neighbors into account. The influence space is proved to be sensitive to local density changes to successfully reduce the amount of parameters and identify adjacent clusters of different densities. Moreover, this new relationship based on the influence space makes the insensitivity to the ordering of inputting points possible. Finally, a new concept—core density reachable based on the influence space is put forward which aims to distinguish between border objects and noisy objects. Several experiments are performed which demonstrate that the performance of our proposed algorithm is better than the traditional DBSCAN algorithm and the improved algorithm IS-DBSCAN.

A Dual Distance Based Spatial Clustering Method

Dual clustering algorithm for spatial data mining

Self-organizing Dual Clustering Considering Spatial Analysis and Hybrid Distance Measures

A Statistical Information-Based Clustering Approach in Distance Space

DCAD: a Dual Clustering Algorithm for Distributed Spatial Databases

A Dual Spatial Clustering Method in the Presence of Heterogeneity and Noise

A Novel Dual-Domain Clustering Algorithm for Inhomogeneous Spatial Point Event.

Research and Application of Multiple Distance Spatial Clustering Algorithm Based on Neighborhood Searching

Spatial Clustering Method Considering Spatial Distribution Feature in the Attribute Domain

Fast Implementation of Dual Clustering Algorithm for Spatial Data Mining

Self-organizing spatial clustering under spatial and attribute constraints

DBSTC: an Effective Method for Discovering Cluster Features with Different Spatiotemporal Densities

Clustering by Detecting Density Peaks and Assigning Points by Similarity-First Search Based on Weighted K-Nearest Neighbors Graph

Scale Space Based on Clustering Method Integrating Spatial Relationships and Non-Spatial Atrributes

A New Spatial Fuzzy C-Means for Spatial Clustering

An Adaptive Dual Clustering Algorithm Based on Hierarchical Structure: A Case Study of Settlement Zoning

An efficient and scalable density-based clustering algorithm for datasets with complex structures.

From Partition-Based Clustering to Density-Based Clustering: Fast Find Clusters with Diverse Shapes and Densities in Spatial Databases

A Subspace Clustering Algorithm for High Dimensional Spatial Data

Clustering Spatial Data by the Neighbors Intersection and the Density Difference

A Spatial Clustering Method Based on Uneven Distribution of Non-spatial Attributes——Identifying City Commercial Center