Abstract:In the face of complex natural images, existing deep clustering algorithms fall significantly short in terms of clustering accuracy when compared to supervised classification methods, making them less practical. This paper introduces an image clustering algorithm based on self-supervised pretrained models and latent feature distribution optimization, substantially enhancing clustering performance. It is found that: (1) For complex natural images, we effectively enhance the discriminative power of latent features by leveraging self-supervised pretrained models and their fine-tuning, resulting in improved clustering performance. (2) In the latent feature space, by searching for k-nearest neighbor images for each training sample and shortening the distance between the training sample and its nearest neighbor, the discriminative power of latent features can be further enhanced, and clustering performance can be improved. (3) In the latent feature space, reducing the distance between sample features and the nearest predefined cluster centroids can optimize the distribution of latent features, therefore further improving clustering performance. Through experiments on multiple datasets, our approach outperforms the latest clustering algorithms and achieves state-of-the-art clustering results. When the number of categories in the datasets is small, such as CIFAR-10 and STL-10, and there are significant differences between categories, our clustering algorithm has similar accuracy to supervised methods without using pretrained models, slightly lower than supervised methods using pre-trained models. The code linked algorithm is <a class="link-external link-https" href="https://github.com/LihengHu/semi" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

### The Problem the Paper Attempts to Solve This paper aims to address the issue of relatively low clustering accuracy of existing deep clustering algorithms when dealing with complex natural images. Specifically, compared to supervised classification methods, existing deep clustering algorithms perform poorly on complex natural images, which limits their practical applications. To solve this problem, the authors propose an image clustering algorithm based on a self-supervised pre-training model and latent feature distribution optimization. This algorithm improves clustering performance through the following points: 1. **Utilizing Self-Supervised Pre-Training Models**: By using self-supervised pre-training models trained on large-scale public datasets and fine-tuning them, the discriminative ability of latent features is enhanced, thereby improving clustering performance. 2. **k-Nearest Neighbors Enhancement**: In the latent feature space, by finding the k-nearest neighbor images for each training sample and shortening the distance between the training sample and its nearest neighbors, the discriminative ability of latent features is further enhanced, improving clustering performance. 3. **Optimizing Latent Feature Distribution**: By reducing the distance between sample features and predefined category centers, the latent feature distribution is optimized, further enhancing clustering performance. Through experiments on multiple datasets, this method surpasses the latest clustering algorithms. On datasets with a small number of categories such as CIFAR-10 and STL-10, the clustering accuracy is close to that of supervised methods but slightly lower than supervised methods using pre-trained models.

Image Clustering Algorithm Based on Self-Supervised Pretrained Models and Latent Feature Distribution Optimization

Mejigclu: more effective jigsaw clustering for unsupervised visual representation learning

Image Clustering via the Principle of Rate Reduction in the Age of Pretrained Models

Deep Subspace Image Clustering Network with Self-Expression and Self-Supervision

A Novel Deeply-Learned Image Quality Analysis Algorithm for Clustering

Graph-Based Semi-Supervised Deep Image Clustering With Adaptive Adjacency Matrix

Deep Self-paced Active Learning for Image Clustering

Combining core points and cluster-level semantic similarity for self-supervised clustering

Learning the Relation between Similarity Loss and Clustering Loss in Self-Supervised Learning

Self-Supervised Self-Organizing Clustering Network: A Novel Unsupervised Representation Learning Method

Using Clustering Analysis to Improve Semi-Supervised Classification.

Improving Image Clustering through Sample Ranking and Its Application to remote--sensing images

Subcategory Clustering with Latent Feature Alignment and Filtering for Object Detection

Unsupervised Graph-Based Image Clustering for Pretext Distribution Learning in IC Assurance

Semi-Supervised Image Classification with Self-Paced Cross-Task Networks

Information Maximization Clustering Via Multi-View Self-Labelling

Semantic-Enhanced Image Clustering

Deep image clustering: A survey

Self-supervised Image Classification Based on the Distances of Deep Feature Space

Semi-Supervised Medical Image Classification Combined with Unsupervised Deep Clustering

Self-paced Deep Clustering with Learning Loss