Abstract:Despite their success in various vision tasks, deep neural network architectures often underperform in out-of-distribution scenarios due to the difference between training and target domain style. To address this limitation, we introduce One-Shot Style Adaptation (OSSA), a novel unsupervised domain adaptation method for object detection that utilizes a single, unlabeled target image to approximate the target domain style. Specifically, OSSA generates diverse target styles by perturbing the style statistics derived from a single target image and then applies these styles to a labeled source dataset at the feature level using Adaptive Instance Normalization (AdaIN). Extensive experiments show that OSSA establishes a new state-of-the-art among one-shot domain adaptation methods by a significant margin, and in some cases, even outperforms strong baselines that use thousands of unlabeled target images. By applying OSSA in various scenarios, including weather, simulated-to-real (sim2real), and visual-to-thermal adaptations, our study explores the overarching significance of the style gap in these contexts. OSSA's simplicity and efficiency allow easy integration into existing frameworks, providing a potentially viable solution for practical applications with limited data availability. Code is available at <a class="link-external link-https" href="https://github.com/RobinGerster7/OSSA" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: in the object detection task, when the training data and the test data come from different distributions (i.e., domain gap), how to use a small number or a single unlabeled target - domain image to quickly adapt to the style of the target domain, thereby improving the performance of the model. Specifically, the paper proposes an unsupervised one - shot style adaptation method named OSSA (One - Shot Style Adaptation), aiming to bridge the style difference between the source domain and the target domain by using only one unlabeled target - domain image. ### Problem Background Traditional deep neural networks (such as convolutional neural networks, CNNs) perform well in object detection tasks, but they usually assume that the training data and the test data come from the same distribution. However, in practical applications, this assumption is often not valid, for example: - **Weather changes**: from sunny days to foggy days. - **Transfer from simulated to real environments**: from simulated environments to real environments. - **Cross - spectral changes**: from visible - light to thermal - infrared images. These changes can cause a significant decline in the performance of object detectors, especially in high - risk fields such as autonomous driving and medical diagnosis, and this problem is particularly prominent. ### Core Contributions of the Paper 1. **Proposed a simple and effective unsupervised one - shot style adaptation method (OSSA)**, which can significantly improve the performance of object detection with only a single unlabeled target - domain image. 2. **Verified that a single image can effectively capture the style of the target domain**, thereby greatly reducing the need for a large amount of target - domain data. 3. **Explored the importance of bridging the style gap in multiple scenarios**, including weather changes, transfer from simulated to real environments, and conversion from visible - light to thermal - infrared images. ### Method Overview OSSA is implemented through the following steps: 1. **Extract the target - domain style**: Extract style statistical information from a single target - domain image. 2. **Generate diverse target styles**: Generate diverse style features by perturbing the extracted style statistical information. 3. **Apply AdaIN for style transfer**: Apply the generated style features to the feature maps of the source - domain images, using the Adaptive Instance Normalization (AdaIN) technique. The formula is expressed as follows: \[ \text{AdaIN}(x, y)=\sigma(y)\left(\frac{x - \mu(x)}{\sigma(x)}\right)+\mu(y) \] where $\mu(x)$ and $\sigma(x)$ are the mean and standard deviation of the source feature map $x$, respectively, and $\mu(y)$ and $\sigma(y)$ are the mean and standard deviation of the target - style feature map $y$, respectively. OSSA further extends this process by introducing multiplicative Gaussian noise to generate new style features: \[ \text{OSSA}(x_i, y)=\alpha\sigma(y)\left(\frac{x_i - \mu(x_i)}{\sigma(x_i)}\right)+\beta\mu(y),\quad\alpha,\beta\sim N(1, 0.75) \] ### Experimental Results Experiments show that OSSA performs excellently in multiple benchmark tests, especially in the tasks from Cityscapes to Foggy Cityscapes and from Sim10k to Cityscapes. It not only outperforms the existing single - shot adaptation methods but also approaches or even exceeds the strong baseline methods that require thousands of unlabeled target - domain images. In conclusion, OSSA provides an efficient and practical solution that can significantly improve the generalization ability of object detectors under limited data conditions.

OSSA: Unsupervised One-Shot Style Adaptation

Target-driven One-Shot Unsupervised Domain Adaptation

Domain Adaptation for Object Detection via Style Consistency

Adversarial Style Mining for One-Shot Unsupervised Domain Adaptation.

Universal Model Adaptation by Style Augmented Open-set Consistency

Noise Transfer for Unsupervised Domain Adaptation of Retinal OCT Images

Learnable Data Augmentation for One-Shot Unsupervised Domain Adaptation

Unsupervised Domain Adaptation via Style-Aware Self-intermediate Domain

Style Adaptation for Domain-adaptive Semantic Segmentation

AWADA: Attention-Weighted Adversarial Domain Adaptation for Object Detection

Sequence-To-Sequence Domain Adaptation Network For Robust Text Image Recognition

Visually Source-Free Domain Adaptation via Adversarial Style Matching

One-Shot Domain Adaptive and Generalizable Semantic Segmentation with Class-Aware Cross-Domain Transformers

AWADA: Foreground-focused adversarial learning for cross-domain object detection

Learning intra-domain style-invariant representation for unsupervised domain adaptation of semantic segmentation

Adaptive Context- and Scale-Aware Aggregation with Feature Alignment for One-Shot Object Detection.

Enhancing Visual Domain Adaptation with Source Preparation

Robust Object Detection Via Adversarial Novel Style Exploration.

Style-Guided Adversarial Teacher for Cross-Domain Object Detection

Intra- & Extra-Source Exemplar-Based Style Synthesis for Improved Domain Generalization

Adversarial Style Discrepancy Minimization for Unsupervised Domain Adaptation.