Source-Free Domain Adaptation for YOLO Object Detection

Simon Varailhon,Masih Aminbeidokhti,Marco Pedersoli,Eric Granger
2024-09-25
Abstract:Source-free domain adaptation (SFDA) is a challenging problem in object detection, where a pre-trained source model is adapted to a new target domain without using any source domain data for privacy and efficiency reasons. Most state-of-the-art SFDA methods for object detection have been proposed for Faster-RCNN, a detector that is known to have high computational complexity. This paper focuses on domain adaptation techniques for real-world vision systems, particularly for the YOLO family of single-shot detectors known for their fast baselines and practical applications. Our proposed SFDA method - Source-Free YOLO (SF-YOLO) - relies on a teacher-student framework in which the student receives images with a learned, target domain-specific augmentation, allowing the model to be trained with only unlabeled target data and without requiring feature alignment. A challenge with self-training using a mean-teacher architecture in the absence of labels is the rapid decline of accuracy due to noisy or drifting pseudo-labels. To address this issue, a teacher-to-student communication mechanism is introduced to help stabilize the training and reduce the reliance on annotated target data for model selection. Despite its simplicity, our approach is competitive with state-of-the-art detectors on several challenging benchmark datasets, even sometimes outperforming methods that use source data for adaptation.
Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to achieve source - free domain adaptation (SFDA) in the object detection task. Specifically, the paper focuses on how to adapt a pre - trained source model to a new target domain without using any source - domain data, in order to protect privacy and improve efficiency. Most existing SFDA methods are designed for Faster - RCNN with high computational complexity, while this paper focuses on single - stage detectors in the YOLO series, which are known for their fast baselines and efficient performance in practical applications. ### Main contributions of the paper: 1. **Introduction of Source - Free YOLO (SF - YOLO)**: - This is the first SFDA method specifically designed for YOLO detectors, establishing a benchmark for future research, especially suitable for real - time applications. - This method utilizes the teacher - student framework and introduces a learned target - domain - specific data augmentation module, allowing training with only unlabeled target - domain data without the need for feature alignment. 2. **Proposal of the Student Stabilisation Module (SSM)**: - This module aims to alleviate the problems of training instability and accuracy degradation due to the lack of labeled data. - SSM provides a new communication channel from the teacher to the student. By periodically replacing the student model with the moving average of the teacher model, it prevents the rapid decline of the student model's performance, thereby improving the robustness of the entire pipeline. 3. **Experimental verification**: - The authors conducted extensive experiments on multiple challenging domain - adaptation benchmark datasets, including Cityscapes, Foggy Cityscapes, Sim10k and KITTI. - The experimental results show that the proposed method performs excellently on multiple datasets and even outperforms UDA methods that require source data in some cases. ### Core technologies of the paper: - **Target Augmentation Module (TAM)**: - This module is used to learn target - domain - specific data augmentation strategies rather than generating random strong - weak augmentation pairs. - TAM enriches the target - domain data by transforming the input images according to the statistical characteristics of the target - domain images through style transfer techniques. - **Consistency learning and teacher knowledge**: - Use the teacher model to generate pseudo - labels to train the student model, but in order to avoid overfitting, a consistency loss is introduced. - The consistency loss filters low - confidence predictions by setting a classification confidence threshold, and then uses the augmented images to train the student model to be consistent with the hard labels. - **Student Stabilisation Module (SSM)**: - Since the student model learns faster and is prone to making mistakes, SSM is introduced to stabilize the training process. - At the end of each epoch, the weights of the student model are updated with the moving average of the teacher model to keep the student model close to the teacher model and prevent excessive deviation. ### Summary: This paper addresses the challenges of achieving source - free domain adaptation in the object detection task, especially in real - time application scenarios, by introducing SF - YOLO and SSM. The experimental results show that this method not only performs well on multiple benchmark datasets but also has a significant improvement in robustness and stability.