Abstract:In this work, we propose a novel approach, namely WeatherDG, that can generate realistic, weather-diverse, and driving-screen images based on the cooperation of two foundation models, i.e, Stable Diffusion (SD) and Large Language Model (LLM). Specifically, we first fine-tune the SD with source data, aligning the content and layout of generated samples with real-world driving scenarios. Then, we propose a procedural prompt generation method based on LLM, which can enrich scenario descriptions and help SD automatically generate more diverse, detailed images. In addition, we introduce a balanced generation strategy, which encourages the SD to generate high-quality objects of tailed classes under various weather conditions, such as riders and motorcycles. This segmentation-model-agnostic method can improve the generalization ability of existing models by additionally adapting them with the generated synthetic data. Experiments on three challenging datasets show that our method can significantly improve the segmentation performance of different state-of-the-art models on target domains. Notably, in the setting of ''Cityscapes to ACDC'', our method improves the baseline HRDA by 13.9% in mIoU.
What problem does this paper attempt to address?
### What problem does this paper attempt to solve?
This paper aims to solve the **Domain Generalization (DG) problem**, especially in semantic segmentation tasks **under severe weather conditions**. Specifically, the authors propose a new method named **WeatherDG** to generate realistic, diverse images that are in line with driving scenarios, in order to improve the generalization ability of the model in unseen domains.
#### Background and problem description
1. **Domain Shift Problem**:
- In the field of autonomous driving, the performance of existing semantic segmentation models will decline significantly when deployed in unseen domains due to the domain shift problem. This problem is more serious especially under severe weather conditions (such as foggy, rainy, snowy days and at night).
- Although collecting more diverse training data is a solution, annotating segmentation data is very time - consuming, so domain generalization has become a popular method to solve the domain shift problem.
2. **Limitations of existing methods**:
- Existing domain generalization methods are mainly divided into two categories: Normalization and Data Augmentation. Among them, the data augmentation method is more flexible, can be combined with different model structures, and is easy to be integrated with other techniques.
- Although some generative models (such as Stable Diffusion, SD) can generate realistic and diverse images, the images generated by directly applying these models may have inconsistent styles and layouts in driving scenarios, resulting in a decline in model performance.
#### Solution
To solve the above problems, the authors propose the WeatherDG method. Its core idea is to generate realistic, diverse images that are in line with driving scenarios through the following steps:
1. **SD Fine - tuning**:
- Use the source data to fine - tune the Stable Diffusion model, so that the content and layout of the generated images are aligned with the real - world driving scenarios.
2. **Procedural Prompt Generation**:
- Based on the large - language model (LLM), propose a procedural prompt generation method to enrich the scene description and help Stable Diffusion automatically generate more diverse and detailed images.
- Introduce a balanced generation strategy to encourage the generation of high - quality objects in small - category (such as riders and motorcycles).
3. **Sample Generation and Model Training**:
- Use the fine - tuned Stable Diffusion and the generated prompts to generate new diverse samples, and use these samples together with the source data for model training.
- Use the unsupervised domain adaptation (UDA) technology to further improve the performance of the model on the target domain.
Through these steps, the WeatherDG method can significantly improve the generalization ability of semantic segmentation models under various severe weather conditions. The experimental results show that on datasets from Cityscapes to ACDC, etc., the WeatherDG method improves the mIoU score by 13.9% compared with the baseline model (such as HRDA).
### Summary
This paper proposes a novel data - enhancement framework, WeatherDG, by combining Stable Diffusion and large - language models, which solves the domain generalization problem and performs excellently especially in semantic segmentation tasks under severe weather conditions.