Abstract:The accurate detection of relevant vehicles, pedestrians, and other targets on the road plays a crucial role in ensuring the safety of autonomous driving. In recent years, object detectors based on Transformers or CNNs have achieved excellent performance in the fully supervised paradigm. However, when the trained model is directly applied to unfamiliar scenes where the training data and testing data have different distributions statistically, the model's performance may decrease dramatically. To address this issue, unsupervised domain adaptive object detection methods have been proposed. However, these methods often exhibit decreasing performance when the gap between the source and target domains increases. Previous works mainly focused on utilizing the style gap to reduce the domain gap while ignoring the content gap. To tackle this challenge, we introduce a novel method called IDI-SCD that effectively addresses both the style and content gaps simultaneously. Firstly, the domain gap is reduced by disentangling it into the style gap and content gap, generating corresponding intermediate domains in the meanwhile. Secondly, during training, we focus on one single domain gap at a time to achieve inter-domain invariance. That is, the content gap is tackled while maintaining the style gap, and vice versa. In addition, the style-invariant loss is used to narrow down the style gap, and the mean teacher self-training framework is used to narrow down the content gap. Finally, we introduce a multiscale fusion strategy to enhance the quality of pseudo-labels, which mainly focus on enhancing the detection performance for extreme-scale objects (very large or very small objects). We conduct extensive experiments on four mainstream datasets of in-vehicle images. The experimental results demonstrate the effectiveness of our method and its superiority over most of the existing methods.

CDTD: A Large-Scale Cross-Domain Benchmark for Instance-Level Image-to-Image Translation and Domain Adaptive Object Detection.

Image Cross-Domain Translation Algorithm Based on Self-Similarity and Contrastive Learning

Towards Instance-level Image-to-Image Translation

DMDIT: Diverse multi-domain image-to-image translation

BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training

Improving Object Detection via Local-global Contrastive Learning

Remote-Sensing Cross-Domain Scene Classification: A Dataset and Benchmark

CCMB: A Large-scale Chinese Cross-modal Benchmark

ImageNet Large Scale Visual Recognition Challenge

Object Detection in Optical Remote Sensing Images: A Survey and A New Benchmark

VisDA: The Visual Domain Adaptation Challenge

Dual Instance-Consistent Network for Cross-Domain Object Detection

Optimizing Multispectral Object Detection: A Bag of Tricks and Comprehensive Benchmarks

Cross-Domain Document Object Detection: Benchmark Suite and Method

Inter-Domain Invariant Cross-Domain Object Detection Using Style and Content Disentanglement for In-Vehicle Images

Camouflaged Object Tracking: A Benchmark

DomainVerse: A Benchmark Towards Real-World Distribution Shifts For Tuning-Free Adaptive Domain Generalization

A broader study of cross-domain few-shot object detection

Image Classification with Small Datasets: Overview and Benchmark

AI Challenger : A Large-scale Dataset for Going Deeper in Image Understanding