Abstract:This study proposes a new clustering algorithm named ANDClust to handle datasets with varying density and neck‐typed clusters. In the proposed algorithm, an Adaptive Neighborhood Distance (AND) ratio is used to weigh the distance between the data pairs as if it differs for each data pair in the dataset. This method makes the approach support not only the varying density among clusters but also the varying density inside the cluster. Although density‐based clustering algorithms can successfully define clusters in arbitrary shapes, they encounter issues if the dataset has varying densities or neck‐typed clusters due to the requirement for precise distance parameters, such as eps parameter of DBSCAN. These approches assume that data density is homogenous, but this is rarely the case in practice. In this study, a new clustering algorithm named ANDClust (Adaptive Neighborhood Distance‐based Clustering Algorithm) is propoesed to handle datasets with varying density and/or neck‐typed clusters. The algorithm consists of three parts. The first part uses Multivariate Kernel Density Estimation (MulKDE) to find the dataset's peak points, which are the start points for the Minimum Spanning Tree (MST) to construct clusters in the second part. Lastly, an Adaptive Neighborhood Distance (AND) ratio is used to weigh the distance between the data pairs. This method enables this approach to support inter‐cluster and intra‐cluster density varieties by acting as if the distance parameter differs for each data of the dataset. ANDClust on synthetic and real datasets are tested to reveal its efficiency. The algorithm shows superior clustering quality in a good run‐time compared to its competitors. Moreover, ANDClust could effectively define clusters of arbitrary shapes and process high‐dimensional, imbalanced datasets may have outliers.

A Multi-Density Clustering Algorithm Based on Similarity for Dataset with Density Variation

Comparative Density Peaks Clustering

A Grid-Based Density Peaks Clustering Algorithm

Density-ratio Based Clustering for Discovering Clusters with Varying Densities.

Density Peak Clustering with connectivity estimation

A Statistical Information-Based Clustering Approach in Distance Space

A Fast Algorithm for Density-Based Clustering in Large Database

A Domain Adaptive Density Clustering Algorithm for Data with Varying Density Distribution

A Parallel Varied Density-Based Clustering Algorithm with Optimized Data Partition

DBSTC: an Effective Method for Discovering Cluster Features with Different Spatiotemporal Densities

SDC-HSDD-NDSA: Structure Detecting Cluster by Hierarchical Secondary Directed Differential with Normalized Density and Self-Adaption

A Novel Density Deviation Multi-Peaks Automatic Clustering Algorithm

A Novel Density Peaks Clustering Algorithm Based on K Nearest Neighbors with Adaptive Merging Strategy

MDBSCAN: A multi-density DBSCAN based on relative density

An Improved Density Peak Clustering Algorithm for Multi-Density Data

A Distance Scaling Method to Improve Density-Based Clustering.

A novel density-based clustering algorithm using nearest neighbor graph

A novel clustering algorithm based on the gravity-mass-square ratio and density core with a dynamic denoising radius

A New Density Peak Clustering Algorithm Based on Cluster Fusion Strategy

ANDClust: An Adaptive Neighborhood Distance‐Based Clustering Algorithm to Cluster Varying Density and/or Neck‐Typed Datasets

A Method of Two-Stage Clustering Learning Based on Improved DBSCAN and Density Peak Algorithm