Abstract:For most clustering methods, not only the number of clusters must be set in advance, but also various hyperparameters such as initial centroids, number of nearest neighbours, the minimum number of points, neighbourhood radius, and cutoff distance all require pre-specification. As one of the most promising unsupervised learning methods in machine intelligence, existing clustering methods cannot simultaneously handle datasets with arbitrary shapes, different densities, distinct sizes, and overlapping. Background outliers and high dimensionality make clustering problems more challenging. In this paper, we propose a novel universal clustering methodology, called G2-SCANN, which yields the best clustering performance for all 30 synthetic and real datasets without any hyperparameter tuning if the exact number of clusters is known. Firstly, the shortest path length (SPL) in complex network or graph-based geodesic distance is used to give a locally backbone-structured description of graph vertex similarity. Accordingly, SPL-weighted local degree (SLD) is defined as vertex attributes of a SPL-weighted graph expressed by G2-SPL adjacency matrix with ε-natural neighbourhood. Secondly, the process of calculating SLD for every data point in a bottom-up way directly leads to division from a complete graph constituted by all data points to a group of SLD trees. This brings the interpretability and the elimination of lone trees. Thirdly, contrastive learning of largest SLD values for finding root vertices of each divisive tree is conducted and top-down category message is then transmitted from the root vertices to all the leaf ones of a SLD tree. It eventually produces tree-like clusters. Totally, the proposed G2-SCANN method leverages both local neighbouring similarity of data points and global information about data distribution and makes it perform better than other methods.

Clustering Description Extraction Based on Statistical Machine Learning

Document Clustering Using Locality Preserving Indexing

A Method of Data Mining Based on SOM Clustering and Its Application

A Clustering Algorithm Based on Mathematical Morphology

Deep Descriptive Clustering

Labeling Clusters from Both Linguistic and Statistical Perspectives: A Hybrid Approach

Co-Clustering With Manifold And Double Sparse Representation

Clustering Algorithms Used in Data Mining

Solutions to General Clustering Algorithmic Issues

Clustering In Knowledge Embedded Space

Interpretable Clustering: A Survey

Towards Explainable Clustering: A Constrained Declarative based Approach

Deep image clustering: A survey

Image Annotations Based on Semi-supervised Clustering with Semantic Soft Constraints.

G2-SCANN: Gaussian-kernel Graph-Based SLD Clustering Algorithm with Natural Neighbourhood

Clustering-Guided Sparse Structural Learning for Unsupervised Feature Selection

Study on Meaningful String Extraction Algorithm for Improving Webpage Classification

A comprehensive framework for explainable cluster analysis

A Comprehensive Survey on Deep Clustering: Taxonomy, Challenges, and Future Directions

Document Clustering Based on Word Sense Cluster

Clustering of Chinese Sentences Using the SMM Model