Abstract:Cognitive computing involves discovering hidden rules and patterns in massive volumes of data. Density peaks clustering (DPC) is a powerful data mining tool that can identify density peaks in decision graphs and assign labels to them without requiring iterations. It can efficiently and simply detect clusters of arbitrary shapes. However, on the one hand, density measurement using the ϵ neighbor or Gaussian kernel only reflects the global structure of the data, so that correct density peaks cannot be found, and performance on manifold datasets is weakened. On the other hand, the one-step allocation strategy results in chain reaction. Once a point with high density is misallocated, a series of points will be incorrectly assigned. To solve this problem, this paper proposes the Jaccard coefficient to measure the similarity between points. The proposed density measurement based on Jaccard coefficient is only related to the k points that share the max similarity with the given point, which can reflect the local structure of manifold datasets, and the density peaks can be identified accurately. Aiming at the chain reaction caused by the assignment strategy of DPC, we develop a two-step allocation strategy based on label propagation and the proposed measurement of similarity. The first step is to assign labels to points close to the clustering centers, where these are equal to labeled points in the label propagation algorithm. The second step is to complete the assignment of labels to the remaining points according to labeled data which is the nearest to each unassigned sample. We compared the proposed algorithm with four algorithms on synthetic datasets and real-world datasets. The three metrics among these algorithms show that the proposed algorithm outperforms other algorithms. The results of clustering on synthetic datasets verified the effectiveness of the proposed method for manifold datasets, and three metrics on the UCI datasets and the Olivetti Faces dataset show that it can reveal the patterns and associations of real-world datasets.

Prime Discriminant Simplicial Complex

Nearest Prime Simplicial Complex for Object Recognition

Discrimination Reveals Reconstructability of Multiplex Networks from Partial Observations

Representing Data by a Mixture of Activated Simplices

Discrimination universally determines reconstruction of multiplex networks

A generalized simplicial model and its application

Cross-Matched Interval Prevalence of High Dimensional Point Clouds

Density Peaks Clustering Based on Jaccard Similarity and Label Propagation

Discriminative learning by sparse representation for classification

Full reconstruction of simplicial complexes from binary contagion and Ising data

Classifiability-Based Optimal Discriminatory Projection Pursuit

A New Approach to Discover Interlacing Data Structures in High-Dimensional Space

Discriminant Local Information Distance Preserving Projection for Set Classification

Distributed Sparse Multicategory Discriminant Analysis

Random walks on simplicial complexes

Persistent Homology of Geospatial Data: A Case Study with Voting

Persistent hypergraph homology and its applications

Persistent Homology via Ellipsoids

Clustering with Simplicial Complexes

Eigenvector centrality in simplicial complexes of hypergraphs