Abstract:Unsupervised representation learning for image clustering is essential in computer vision. Although the advancement of visual models has improved image clustering with efficient visual representations, challenges still remain. Firstly, these features often lack the ability to represent the internal structure of images, hindering the accurate clustering of visually similar images. Secondly, the existing features tend to lack finer-grained semantic labels, limiting the ability to capture nuanced differences and similarities between images. In this paper, we first introduce Jigsaw based strategy method for image clustering called Grid Jigsaw Representation (GJR) with systematic exposition from pixel to feature in discrepancy against human and computer. We emphasize that this algorithm, which mimics human jigsaw puzzle, can effectively improve the model to distinguish the spatial feature between different samples and enhance the clustering ability. GJR modules are appended to a variety of deep convolutional networks and tested with significant improvements on a wide range of benchmark datasets including CIFAR-10, CIFAR-100/20, STL-10, ImageNet-10 and ImageNetDog-15. On the other hand, convergence efficiency is always an important challenge for unsupervised image clustering. Recently, pretrained representation learning has made great progress and released models can extract mature visual representations. It is obvious that use the pretrained model as feature extractor can speed up the convergence of clustering where our aim is to provide new perspective in image clustering with reasonable resource application and provide new baseline. Further, we innovate pretrain-based Grid Jigsaw Representation (pGJR) with improvement by GJR. The experiment results show the effectiveness on the clustering task with respect to the ACC, NMI and ARI three metrics and super fast convergence speed.

Iterative Reorganization with Weak Spatial Constraints: Solving Arbitrary Jigsaw Puzzles for Unsupervised Representation Learning

Mejigclu: more effective jigsaw clustering for unsupervised visual representation learning

Jigsaw-ViT: Learning Jigsaw Puzzles in Vision Transformer

Jigsaw Puzzle Solving Using Local Feature Co-Occurrences in Deep Neural Networks

View Enhanced Jigsaw Puzzle for Self-Supervised Feature Learning in 3D Human Action Recognition

JigsawGAN: Auxiliary Learning for Solving Jigsaw Puzzles with Generative Adversarial Networks

Jigsaw Clustering for Unsupervised Visual Representation Learning

Grid Jigsaw Representation with CLIP: A New Perspective on Image Clustering

Solving Masked Jigsaw Puzzles with Diffusion Vision Transformers

MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning

JigsawNet: Shredded Image Reassembly Using Convolutional Neural Network and Loop-Based Composition

PuzzleFusion: Unleashing the Power of Diffusion Models for Spatial Puzzle Solving

Scribble-Based 3D Shape Segmentation via Weakly-Supervised Learning

Graph Jigsaw Learning for Cartoon Face Recognition

Pictorial and apictorial polygonal jigsaw puzzles: The lazy caterer model, properties, and solvers

Self-supervised Learning with Fully Convolutional Networks

Spatial Relationship Representation for Visual Object Searching

Pictorial and Apictorial Polygonal Jigsaw Puzzles from Arbitrary Number of Crossing Cuts

Learning to Associate Words and Images Using a Large-scale Graph

A Rapid Image Comparison Approach to Automatic Recognition and Assembly of Jigsaw Puzzles

Solve the Puzzle of Instance Segmentation in Videos: A Weakly Supervised Framework With Spatio-Temporal Collaboration