Self-Supervised Learning for Real-World Object Detection: a Survey

Alina Ciocarlan,Sidonie Lefebvre,Sylvie Le Hégarat-Mascle,Arnaud Woiselle
2024-10-11
Abstract:Self-Supervised Learning (SSL) has emerged as a promising approach in computer vision, enabling networks to learn meaningful representations from large unlabeled datasets. SSL methods fall into two main categories: instance discrimination and Masked Image Modeling (MIM). While instance discrimination is fundamental to SSL, it was originally designed for classification and may be less effective for object detection, particularly for small objects. In this survey, we focus on SSL methods specifically tailored for real-world object detection, with an emphasis on detecting small objects in complex environments. Unlike previous surveys, we offer a detailed comparison of SSL strategies, including object-level instance discrimination and MIM methods, and assess their effectiveness for small object detection using both CNN and ViT-based architectures. Specifically, our benchmark is performed on the widely-used COCO dataset, as well as on a specialized real-world dataset focused on vehicle detection in infrared remote sensing imagery. We also assess the impact of pre-training on custom domain-specific datasets, highlighting how certain SSL strategies are better suited for handling uncurated data. Our findings highlight that instance discrimination methods perform well with CNN-based encoders, while MIM methods are better suited for ViT-based architectures and custom dataset pre-training. This survey provides a practical guide for selecting optimal SSL strategies, taking into account factors such as backbone architecture, object size, and custom pre-training requirements. Ultimately, we show that choosing an appropriate SSL pre-training strategy, along with a suitable encoder, significantly enhances performance in real-world object detection, particularly for small object detection in frugal settings.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to explore the application of self - supervised learning (SSL) methods in real - world object detection, especially for the challenges in small - object detection and resource - constrained environments. Specifically, the paper mainly focuses on the following aspects: 1. **Applicability of SSL methods**: - Existing SSL methods are mainly concentrated on classification tasks and may not perform well for object detection, especially small - object detection. The paper hopes to find the most suitable method for object detection by comparing different SSL strategies in detail. 2. **Comparison of different SSL paradigms**: - The paper makes a detailed comparison between two main SSL paradigms, instance discrimination and Masked Image Modeling (MIM), and evaluates their performance on different network architectures (such as CNN and ViT), especially in small - object detection tasks. 3. **Pre - training in specific domains**: - The paper studies the influence of pre - training on specific - domain datasets, especially for the cases where the labeled data is unlabeled or difficult to obtain. Through experiments, it verifies which SSL strategies are more suitable for handling unlabeled data in specific domains. 4. **Real - world application scenarios**: - The paper not only conducts benchmark tests on the widely - used COCO dataset but also pays special attention to real - world application scenarios such as vehicle detection, and uses the infrared remote - sensing image dataset VEDAI for evaluation. This helps to understand the performance of SSL methods under complex backgrounds and different sensor conditions. 5. **Selection of the optimal SSL strategy**: - The paper provides practical guidelines for selecting the optimal SSL strategy, considering multiple factors such as backbone network architecture, object size, and the scale of the pre - training dataset. The ultimate goal is to help researchers and practitioners select the most appropriate SSL method according to specific requirements to improve object detection performance. ### Summary The main purpose of this paper is to provide a systematic, self - supervised - learning - based solution for object detection tasks through comprehensive investigations and experiments, especially the optimization strategies for small - object detection and resource - constrained environments. By comparing the performance of different SSL methods, the paper hopes to provide valuable references and guidance for future research and practical applications.