Abstract:Cognitive computing involves discovering hidden rules and patterns in massive volumes of data. Density peaks clustering (DPC) is a powerful data mining tool that can identify density peaks in decision graphs and assign labels to them without requiring iterations. It can efficiently and simply detect clusters of arbitrary shapes. However, on the one hand, density measurement using the ϵ neighbor or Gaussian kernel only reflects the global structure of the data, so that correct density peaks cannot be found, and performance on manifold datasets is weakened. On the other hand, the one-step allocation strategy results in chain reaction. Once a point with high density is misallocated, a series of points will be incorrectly assigned. To solve this problem, this paper proposes the Jaccard coefficient to measure the similarity between points. The proposed density measurement based on Jaccard coefficient is only related to the k points that share the max similarity with the given point, which can reflect the local structure of manifold datasets, and the density peaks can be identified accurately. Aiming at the chain reaction caused by the assignment strategy of DPC, we develop a two-step allocation strategy based on label propagation and the proposed measurement of similarity. The first step is to assign labels to points close to the clustering centers, where these are equal to labeled points in the label propagation algorithm. The second step is to complete the assignment of labels to the remaining points according to labeled data which is the nearest to each unassigned sample. We compared the proposed algorithm with four algorithms on synthetic datasets and real-world datasets. The three metrics among these algorithms show that the proposed algorithm outperforms other algorithms. The results of clustering on synthetic datasets verified the effectiveness of the proposed method for manifold datasets, and three metrics on the UCI datasets and the Olivetti Faces dataset show that it can reveal the patterns and associations of real-world datasets.

A Sampling-Based Density Peaks Clustering Algorithm for Large-Scale Data

A Grid-Based Density Peaks Clustering Algorithm

Density Peak Clustering with connectivity estimation

A review of related density peaks clustering approaches

A Fast Algorithm for Density-Based Clustering in Large Database

DPC-FSC: an Approach of Fuzzy Semantic Cells to Density Peaks Clustering

Density Peaks Clustering by Granular Computing with Label Propagation

Density peaks clustering algorithm based on improved similarity and allocation strategy

Comparative Density Peaks Clustering

UP-DPC: Ultra-scalable Parallel Density Peak Clustering

Constraint-based Clustering by Fast Search and Find of Density Peaks

Fast Density Peaks Clustering Algorithm Based on Improved Mutual K-nearest-neighbor and Sub-cluster Merging

Density Peaks Clustering Based on Jaccard Similarity and Label Propagation

Density decay graph-based density peak clustering

Faster Parallel Exact Density Peaks Clustering

A Novel Density Peaks Clustering Algorithm Based on K Nearest Neighbors with Adaptive Merging Strategy

Density peaks clustering based on superior nodes and fuzzy correlation

Local density based on weighted K-nearest neighbors for density peaks clustering

An Improved Density Peak Clustering Algorithm for Multi-Density Data

An Improved Density Peaks Clustering Algorithm Based On Density Ratio

A Novel Density Deviation Multi-Peaks Automatic Clustering Algorithm