Revisiting the Self-supervised Learning Method of Solving Jigsaw Puzzles.

Xuchao Gong,Qilu Zhao,Runlin Li,Zongmin Li
DOI: https://doi.org/10.1145/3556223.3556236
2022-01-01
Abstract:Spatial information is important for unsupervised feature learning. It has been proved by previous work that solving jigsaw puzzles as a pretext task can be used to train a convolutional neural network, which is capable of solving other visual tasks, such as image classification and object detection. A jigsaw puzzle solver can tell that an object is made of parts and what these parts are, thus the learned features can capture semantically relevant content. This work is inspiring and proposed a powerful learning mechanism, which outperformed the previous state of the art, as the pre-training method, for object detection and classification tasks on PASCAL VOC 2007. However, the original work still has several deficiencies, especially the evaluating scheme and lack of empirical ablation analysis. In this paper, we choose a more direct evaluating scheme and more appropriate datasets to conduct empirical ablation experiments for the self-supervised learning mechanism of solving jigsaw puzzles, and we also extend the learning mechanism to train auto-encoders for more general evaluation. We have explored how to build convolutional neural networks and auto-encoders by playing jigsaw puzzles with two types of network architecture (siamese-ennead network and straight-line network), resulting in four different jigsaw puzzle solvers. In the experiments, we evaluated the features learned by these solvers and discussed the influences of several training tricks on them. The results showed that solving jigsaw puzzles was very effective for unsupervised representation learning. The best performance of each solver all outperformed the state of the art on STL-10. Especially, one solver achieved 80.07% ± 0.08% classification accuracy, which is about 5.87% higher than the state of the art method on STL-10. This is a huge improvement and well worth reporting. Besides, we reported many empirical results on the influence of different training tricks and network configurations, which are very useful for the application and further research of the jigsaw puzzle solvers.
What problem does this paper attempt to address?