Deep Clustering via Distribution Learning

Guanfang Dong,Zijie Tan,Chenqiu Zhao,Anup Basu

2024-08-07

Abstract:Distribution learning finds probability density functions from a set of data samples, whereas clustering aims to group similar data points to form clusters. Although there are deep clustering methods that employ distribution learning methods, past work still lacks theoretical analysis regarding the relationship between clustering and distribution learning. Thus, in this work, we provide a theoretical analysis to guide the optimization of clustering via distribution learning. To achieve better results, we embed deep clustering guided by a theoretical analysis. Furthermore, the distribution learning method cannot always be directly applied to data. To overcome this issue, we introduce a clustering-oriented distribution learning method called Monte-Carlo Marginalization for Clustering. We integrate Monte-Carlo Marginalization for Clustering into Deep Clustering, resulting in Deep Clustering via Distribution Learning (DCDL). Eventually, the proposed DCDL achieves promising results compared to state-of-the-art methods on popular datasets. Considering a clustering task, the new distribution learning method outperforms previous methods as well.

Machine Learning

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve The main purpose of this paper is to improve deep clustering through Distribution Learning. Specifically, the authors propose the following points: 1. **Theoretical Analysis**: - Provide a theoretical analysis of the relationship between clustering tasks and distribution learning. - Treat each data point as a sample from an underlying distribution and the entire dataset as a mixture model. - Achieve clustering by simplifying the prior distribution. 2. **Method Innovation**: - Propose a method called Monte-Carlo Marginalization for Clustering (MCMarg-C). - The MCMarg-C method is optimized for high-dimensional data and can learn distributions directly from high-dimensional data. - In experimental results, MCMarg-C performs excellently, surpassing existing clustering methods. 3. **Overall Framework**: - Propose a new framework called Deep Clustering via Distribution Learning (DCDL). - In this framework, dimensionality reduction is first performed through an autoencoder, then data is further transformed into manifold space using Uniform Manifold Approximation and Projection (UMAP), and finally, MCMarg-C is used for distribution learning to obtain clustering labels. 4. **Experimental Validation**: - Conduct experiments on popular datasets such as MNIST, FashionMNIST, USPS, and Pendigits. - Experimental results show that DCDL outperforms existing deep clustering methods. In summary, this paper aims to improve the effectiveness of deep clustering through theoretical analysis and new distribution learning methods, and demonstrates its superiority in practical applications.

Deep Clustering via Distribution Learning

Deep Clustering and Visualization for End-to-End High-Dimensional Data Analysis.

Deep Discriminative Clustering Analysis

Deep Density-based Image Clustering

Deep Image Clustering Based on Curriculum Learning and Density Information

Deep Reinforcement Clustering

DPC-DNG: Graph-based Label Propagation of K-Nearest Higher-Density Neighbors for Density Peaks Clustering

Density Peaks Clustering by Granular Computing with Label Propagation

Parallel Massive Clustering of Discrete Distributions

Improved Training of Deep Text Clustering

Deep Divergence-Based Approach to Clustering

Deep embedded clustering with distribution consistency preservation for attributed networks

Semi-supervised deep embedded clustering.

Dual-disentangled Deep Multiple Clustering

Deep Fair Discriminative Clustering

Deep graph clustering by integrating community structure with neighborhood information

Deep Learning with Nonparametric Clustering

Deep Clustering with Diffused Sampling and Hardness-aware Self-distillation

Deep Fuzzy Variable C-Means Clustering Incorporated with Curriculum Learning

Distributed Clustering based on Distributional Kernel

A Comprehensive Survey on Deep Clustering: Taxonomy, Challenges, and Future Directions