Abstract:This paper presents a novel clustering algorithm from the SPINEX (Similarity-based Predictions with Explainable Neighbors Exploration) algorithmic family. The newly proposed clustering variant leverages the concept of similarity and higher-order interactions across multiple subspaces to group data into clusters. To showcase the merit of SPINEX, a thorough set of benchmarking experiments was carried out against 13 algorithms, namely, Affinity Propagation, Agglomerative, Birch, DBSCAN, Gaussian Mixture, HDBSCAN, K-Means, KMedoids, Mean Shift, MiniBatch K-Means, OPTICS, Spectral Clustering, and Ward Hierarchical. Then, the performance of all algorithms was examined across 51 synthetic and real datasets from various domains, dimensions, and complexities. Furthermore, we present a companion complexity analysis to compare the complexity of SPINEX to that of the aforementioned algorithms. Our results demonstrate that SPINEX can outperform commonly adopted clustering algorithms by ranking within the top-5 best performing algorithms and has moderate complexity. Finally, a demonstration of the explainability capabilities of SPINEX, along with future research needs, is presented.

What problem does this paper attempt to address?

The paper introduces a novel clustering algorithm called SPINEX (Similarity-based Prediction and Interpretable Neighbor Exploration). Traditional clustering algorithms face challenges in handling complex datasets, such as parameter selection, high-dimensional data processing, and lack of interpretability. The SPINEX algorithm utilizes similarity and high-order interactions within multiple subspaces to group data, aiming to overcome these challenges. The paper demonstrates the performance of SPINEX on 51 synthetic and real-world datasets through benchmark tests against 13 other clustering algorithms. It also includes an accompanying complexity analysis. The results show that SPINEX ranks among the top five algorithms with moderate complexity and provides interpretability, which is crucial for applications requiring result interpretation. The features of SPINEX include: 1. Utilizing multiple similarity measures (such as correlation, Spearman's rank correlation, kernel similarity, and cosine similarity) to adapt to different types and distributions of data. 2. Employing an adaptive method to dynamically adjust parameters, capable of handling both pre-defined and undefined numbers of clusters. 3. Using a multi-level clustering approach to identify hierarchy within the data. 4. Emphasizing interpretability by providing insights into how clusters are formed, aiding in understanding and interpreting results. 5. Applicable for handling noise and outliers, with scalability for large datasets. The algorithm workflow includes initialization, similarity matrix calculation, threshold setting, similarity-based clustering, performance evaluation, and selection of the best clustering. In addition, SPINEX provides functionalities such as feature contribution analysis and nearest neighbor analysis to enhance its interpretability. Through these characteristics, SPINEX aims to improve clustering performance, especially on complex and high-dimensional datasets.

SPINEX-Clustering: Similarity-based Predictions with Explainable Neighbors Exploration for Clustering Problems

SPINEX: Similarity-based Predictions with Explainable Neighbors Exploration for Anomaly and Outlier Detection

SPINEX-TimeSeries: Similarity-based Predictions with Explainable Neighbors Exploration for Time Series and Forecasting Problems

ExClus: Explainable Clustering on Low-dimensional Data Representations

SPINEX_ Symbolic Regression: Similarity-based Symbolic Regression with Explainable Neighbors Exploration

Contrastive explainable clustering with differential privacy

Explainable Clustering via Exemplars: Complexity and Efficient Approximation Algorithms

Subspace Clustering by Directly Solving Discriminative K-means

An Extenics-Based Criteria Clustering Method

A comprehensive framework for explainable cluster analysis

Clusters in Explanation Space: Inferring disease subtypes from model explanations

Fast and explainable clustering based on sorting

FINEX: A Fast Index for Exact & Flexible Density-Based Clustering (Extended Version with Proofs)*

Deep Descriptive Clustering

Simultaneous Estimation of Number of Clusters and Feature Sparsity in Clustering High-Dimensional Data

SPICE: Semantic Pseudo-labeling for Image Clustering

ModEx and Seed-Detective: Two novel techniques for high quality clustering by using good initial seeds in K-Means

Towards Explainable Clustering: A Constrained Declarative based Approach

Provable Data Clustering via Innovation Search

A Sparse Framework for Robust Possibilistic K-Subspace Clustering