OSSA: Unsupervised One-Shot Style Adaptation

Robin Gerster,Holger Caesar,Matthias Rapp,Alexander Wolpert,Michael Teutsch
2024-10-02
Abstract:Despite their success in various vision tasks, deep neural network architectures often underperform in out-of-distribution scenarios due to the difference between training and target domain style. To address this limitation, we introduce One-Shot Style Adaptation (OSSA), a novel unsupervised domain adaptation method for object detection that utilizes a single, unlabeled target image to approximate the target domain style. Specifically, OSSA generates diverse target styles by perturbing the style statistics derived from a single target image and then applies these styles to a labeled source dataset at the feature level using Adaptive Instance Normalization (AdaIN). Extensive experiments show that OSSA establishes a new state-of-the-art among one-shot domain adaptation methods by a significant margin, and in some cases, even outperforms strong baselines that use thousands of unlabeled target images. By applying OSSA in various scenarios, including weather, simulated-to-real (sim2real), and visual-to-thermal adaptations, our study explores the overarching significance of the style gap in these contexts. OSSA's simplicity and efficiency allow easy integration into existing frameworks, providing a potentially viable solution for practical applications with limited data availability. Code is available at <a class="link-external link-https" href="https://github.com/RobinGerster7/OSSA" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in the object detection task, when the training data and the test data come from different distributions (i.e., domain gap), how to use a small number or a single unlabeled target - domain image to quickly adapt to the style of the target domain, thereby improving the performance of the model. Specifically, the paper proposes an unsupervised one - shot style adaptation method named OSSA (One - Shot Style Adaptation), aiming to bridge the style difference between the source domain and the target domain by using only one unlabeled target - domain image. ### Problem Background Traditional deep neural networks (such as convolutional neural networks, CNNs) perform well in object detection tasks, but they usually assume that the training data and the test data come from the same distribution. However, in practical applications, this assumption is often not valid, for example: - **Weather changes**: from sunny days to foggy days. - **Transfer from simulated to real environments**: from simulated environments to real environments. - **Cross - spectral changes**: from visible - light to thermal - infrared images. These changes can cause a significant decline in the performance of object detectors, especially in high - risk fields such as autonomous driving and medical diagnosis, and this problem is particularly prominent. ### Core Contributions of the Paper 1. **Proposed a simple and effective unsupervised one - shot style adaptation method (OSSA)**, which can significantly improve the performance of object detection with only a single unlabeled target - domain image. 2. **Verified that a single image can effectively capture the style of the target domain**, thereby greatly reducing the need for a large amount of target - domain data. 3. **Explored the importance of bridging the style gap in multiple scenarios**, including weather changes, transfer from simulated to real environments, and conversion from visible - light to thermal - infrared images. ### Method Overview OSSA is implemented through the following steps: 1. **Extract the target - domain style**: Extract style statistical information from a single target - domain image. 2. **Generate diverse target styles**: Generate diverse style features by perturbing the extracted style statistical information. 3. **Apply AdaIN for style transfer**: Apply the generated style features to the feature maps of the source - domain images, using the Adaptive Instance Normalization (AdaIN) technique. The formula is expressed as follows: \[ \text{AdaIN}(x, y)=\sigma(y)\left(\frac{x - \mu(x)}{\sigma(x)}\right)+\mu(y) \] where $\mu(x)$ and $\sigma(x)$ are the mean and standard deviation of the source feature map $x$, respectively, and $\mu(y)$ and $\sigma(y)$ are the mean and standard deviation of the target - style feature map $y$, respectively. OSSA further extends this process by introducing multiplicative Gaussian noise to generate new style features: \[ \text{OSSA}(x_i, y)=\alpha\sigma(y)\left(\frac{x_i - \mu(x_i)}{\sigma(x_i)}\right)+\beta\mu(y),\quad\alpha,\beta\sim N(1, 0.75) \] ### Experimental Results Experiments show that OSSA performs excellently in multiple benchmark tests, especially in the tasks from Cityscapes to Foggy Cityscapes and from Sim10k to Cityscapes. It not only outperforms the existing single - shot adaptation methods but also approaches or even exceeds the strong baseline methods that require thousands of unlabeled target - domain images. In conclusion, OSSA provides an efficient and practical solution that can significantly improve the generalization ability of object detectors under limited data conditions.