Self-supervised visual learning in the low-data regime: a comparative evaluation

Sotirios Konstantakos,Despina Ioanna Chalkiadaki,Ioannis Mademlis,Yuki M. Asano,Efstratios Gavves,Georgios Th. Papadopoulos
2024-04-26
Abstract:Self-Supervised Learning (SSL) is a valuable and robust training methodology for contemporary Deep Neural Networks (DNNs), enabling unsupervised pretraining on a `pretext task' that does not require ground-truth labels/annotation. This allows efficient representation learning from massive amounts of unlabeled training data, which in turn leads to increased accuracy in a `downstream task' by exploiting supervised transfer learning. Despite the relatively straightforward conceptualization and applicability of SSL, it is not always feasible to collect and/or to utilize very large pretraining datasets, especially when it comes to real-world application settings. In particular, in cases of specialized and domain-specific application scenarios, it may not be achievable or practical to assemble a relevant image pretraining dataset in the order of millions of instances or it could be computationally infeasible to pretrain at this scale. This motivates an investigation on the effectiveness of common SSL pretext tasks, when the pretraining dataset is of relatively limited/constrained size. In this context, this work introduces a taxonomy of modern visual SSL methods, accompanied by detailed explanations and insights regarding the main categories of approaches, and, subsequently, conducts a thorough comparative experimental evaluation in the low-data regime, targeting to identify: a) what is learnt via low-data SSL pretraining, and b) how do different SSL categories behave in such training scenarios. Interestingly, for domain-specific downstream tasks, in-domain low-data SSL pretraining outperforms the common approach of large-scale pretraining on general datasets. Grounded on the obtained results, valuable insights are highlighted regarding the performance of each category of SSL methods, which in turn suggest straightforward future research directions in the field.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the evaluation of the effectiveness of self - supervised learning (SSL) methods in the case of limited data volume. Specifically, the paper focuses on how different SSL pre - training tasks perform in visual representation learning when the scale of the pre - training data set is relatively small (for example, 50,000 to 300,000 images). This research motivation comes from the fact that in many application fields in the real world (such as medical imaging), it is difficult to collect large - scale data sets, even if these data do not need to be manually labeled. In addition, even if a large amount of data can be collected, due to the limitation of computing resources, it may not be possible to perform pre - training on such a large - scale data. The main objectives of the paper include: 1. **Explore what can be learned from SSL pre - training with a low data volume**: By performing SSL pre - training on small - scale data sets, study which useful features or representations these models can learn. 2. **Differences in the performance of different SSL methods in low - data - volume scenarios**: Compare the performance of different types of SSL methods (such as contrastive learning, generative learning, clustering, and self - distillation) under low - data - volume conditions to understand which method is more suitable for this scenario. Through these studies, the paper hopes to provide valuable insights for researchers working in specific fields (such as X - ray image analysis), where it is usually difficult to obtain a large amount of data, even unlabeled data. This will not only help optimize the performance of existing models, but may also provide guidance for future research directions.