Semi-Supervised Object Detection: A Survey on Progress from CNN to Transformer

Tahira Shehzadi,Ifza,Didier Stricker,Muhammad Zeshan Afzal

2024-07-16

Abstract:The impressive advancements in semi-supervised learning have driven researchers to explore its potential in object detection tasks within the field of computer vision. Semi-Supervised Object Detection (SSOD) leverages a combination of a small labeled dataset and a larger, unlabeled dataset. This approach effectively reduces the dependence on large labeled datasets, which are often expensive and time-consuming to obtain. Initially, SSOD models encountered challenges in effectively leveraging unlabeled data and managing noise in generated pseudo-labels for unlabeled data. However, numerous recent advancements have addressed these issues, resulting in substantial improvements in SSOD performance. This paper presents a comprehensive review of 27 cutting-edge developments in SSOD methodologies, from Convolutional Neural Networks (CNNs) to Transformers. We delve into the core components of semi-supervised learning and its integration into object detection frameworks, covering data augmentation techniques, pseudo-labeling strategies, consistency regularization, and adversarial training methods. Furthermore, we conduct a comparative analysis of various SSOD models, evaluating their performance and architectural differences. We aim to ignite further research interest in overcoming existing challenges and exploring new directions in semi-supervised learning for object detection.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper is primarily dedicated to addressing issues in the field of Semi-Supervised Object Detection (SSOD). Specifically: 1. **Reducing dependence on large amounts of labeled data**: By combining a small amount of labeled data with a large amount of unlabeled data, the need for expensive and time-consuming labeled data is reduced. 2. **Improving SSOD performance**: A series of improvement methods are proposed to address the challenges faced by early SSOD models in utilizing unlabeled data and managing noise in generated pseudo-labels, thereby significantly enhancing SSOD performance. 3. **Reviewing the latest advancements**: The paper comprehensively reviews 27 cutting-edge SSOD methods from Convolutional Neural Networks (CNNs) to Transformer architectures, discussing their core components, technical strategies, and architectural differences. 4. **Promoting further research**: It aims to inspire more research interest in overcoming existing challenges and exploring new directions in SSOD. Through these efforts, the paper hopes to advance SSOD technology, enabling its broader application in fields such as autonomous driving, medical image analysis, agriculture, and manufacturing.

Semi-Supervised Object Detection: A Survey on Progress from CNN to Transformer

Semi-supervised Object Detection: A Survey on Recent Research and Progress

SemiCVT: Semi-Supervised Convolutional Vision Transformer for Semantic Segmentation

Semi-supervised learning approach for construction object detection by integrating super-resolution and mean teacher network

CISO: Co-iteration Semi-Supervised Learning for Visual Object Detection

Object Detection Using Deep Learning, CNNs and Vision Transformers: A Review

Boosting Salient Object Detection with Transformer-based Asymmetric Bilateral U-Net

An Empirical Study Of Self-supervised Learning Approaches For Object Detection With Transformers

Semantic Distillation Guided Salient Object Detection

Transformers and CNNs Fusion Network for Salient Object Detection.

Transformers in Small Object Detection: A Benchmark and Survey of State-of-the-Art

SSD-MonoDETR: Supervised Scale-aware Deformable Transformer for Monocular 3D Object Detection

Towards End-to-end Semi-supervised Learning for One-stage Object Detection

Scale-Equivalent Distillation for Semi-Supervised Object Detection

Scaling Novel Object Detection with Weakly Supervised Detection Transformers

Object detection using convolutional neural networks and transformer-based models: a review

DSOD: Learning Deeply Supervised Object Detectors from Scratch

Unifying convolution and transformer: a dual stage network equipped with cross-interactive multi-modal feature fusion and edge guidance for RGB-D salient object detection

Object Detection from Scratch with Deep Supervision

Boosting Semi-Supervised Object Detection in Remote Sensing Images With Active Teaching