Abstract:The generalization with respect to domain shifts, as they frequently appear in applications such as autonomous driving, is one of the remaining big challenges for deep learning models. Therefore, we propose an exemplar-based style synthesis pipeline to improve domain generalization in semantic segmentation. Our method is based on a novel masked noise encoder for StyleGAN2 inversion. The model learns to faithfully reconstruct the image, preserving its semantic layout through noise prediction. Using the proposed masked noise encoder to randomize style and content combinations in the training set, i.e., intra-source style augmentation (ISSA) effectively increases the diversity of training data and reduces spurious correlation. As a result, we achieve up to $12.4\%$ mIoU improvements on driving-scene semantic segmentation under different types of data shifts, i.e., changing geographic locations, adverse weather conditions, and day to night. ISSA is model-agnostic and straightforwardly applicable with CNNs and Transformers. It is also complementary to other domain generalization techniques, e.g., it improves the recent state-of-the-art solution RobustNet by $3\%$ mIoU in Cityscapes to Dark Zürich. In addition, we demonstrate the strong plug-n-play ability of the proposed style synthesis pipeline, which is readily usable for extra-source exemplars e.g., web-crawled images, without any retraining or fine-tuning. Moreover, we study a new use case to indicate neural network's generalization capability by building a stylized proxy validation set. This application has significant practical sense for selecting models to be deployed in the open-world environment. Our code is available at \url{<a class="link-external link-https" href="https://github.com/boschresearch/ISSA" rel="external noopener nofollow">this https URL</a>}.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to improve the generalization ability to different domain changes in deep - learning models, especially the domain changes commonly seen in applications such as autonomous driving. Specifically, the paper proposes an example - based style synthesis method to improve the domain generalization performance in semantic segmentation tasks. This method aims to improve the generalization ability of the model by enhancing the diversity of training data and reducing the spurious correlations present in the training set. ### Core Problems of the Paper 1. **Domain Generalization Challenges**: Deep - learning models often perform poorly when facing unseen environments or conditions (such as different geographical locations, severe weather conditions, day - night changes, etc.). This is because the model may only be exposed to a limited or biased data set during training, resulting in insufficient generalization ability. 2. **Lack of Data Diversity**: Existing methods usually require multi - source domain data to improve generalization ability, but collecting and annotating these data are both time - consuming and expensive. 3. **Separation of Style and Content**: How to change the appearance style of an image while keeping its semantic layout unchanged, thereby increasing the diversity of data. ### Solutions The paper proposes an Exemplar - Based Style Synthesis Pipeline, which mainly includes the following aspects: 1. **Masked Noise Encoder**: - **High - Fidelity Reconstruction**: By introducing Random Noise Masking, ensure that the generator can faithfully reconstruct the image while preserving its semantic layout. - **Style - Mixing Ability**: Random Noise Masking helps to separate style and content information, enabling the model to change the style of the image without changing its content. 2. **Intra - Source Style Augmentation (ISSA)**: - **Enhancing Data Diversity**: Utilize the training samples within the source domain, extract their styles and contents, and randomly mix these styles and contents, thereby increasing the diversity of training data. - **Reducing Spurious Correlations**: Through style mixing, reduce the spurious correlations between style and content in the training set and improve the generalization ability of the model. 3. **Extra - Source Style Augmentation (ESSA)**: - **Extension to Unknown Domains**: This method can be directly applied to additional source - domain data (such as images crawled from the web) without retraining or fine - tuning the model, further improving the generalization performance. 4. **Evaluating the Generalization Ability of the Model**: - **Style - Enhanced Proxy Validation Set**: By transferring the style of the unlabeled data in the target domain to the labeled data, construct a style - enhanced proxy validation set for evaluating the generalization ability of the model on unknown data. ### Main Contributions - **High - Quality Reconstruction and Style Mixing**: Propose a new Masked Noise Encoder that can reconstruct complex - scene images with high fidelity and achieve effective style mixing. - **Improvement of Domain Generalization Performance**: Through the ISSA method, significantly improve the domain generalization performance of semantic segmentation tasks under different network architectures and domain - change conditions, with an improvement of up to 12.4% mIoU. - **Plug - in Application Ability**: The Masked Noise Encoder has good plug - in application ability and can be directly applied to additional source - domain data without retraining. - **Model Selection Tool**: By constructing a style - enhanced proxy validation set, provide a method for evaluating the generalization ability of the model without additional annotation work. In conclusion, this paper effectively solves the challenges of domain generalization in deep - learning models by proposing an example - based style synthesis method, which is of great significance especially in safety - critical applications such as autonomous driving.

Intra- & Extra-Source Exemplar-Based Style Synthesis for Improved Domain Generalization

IS2Net: Intra-domain Semantic and Inter-domain Style Enhancement for Semi-supervised Medical Domain Generalization

Adversarial Style Augmentation for Domain Generalized Urban-Scene Segmentation

Style Transformer for Image Inversion and Editing

DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control

MoreStyle: Relax Low-frequency Constraint of Fourier-based Image Reconstruction in Generalizable Medical Image Segmentation

Example-Guided Style Consistent Image Synthesis from Semantic Labeling

StyleIPSB: Identity-Preserving Semantic Basis of StyleGAN for High Fidelity Face Swapping

Learning intra-domain style-invariant representation for unsupervised domain adaptation of semantic segmentation

StyDeSty: Min-Max Stylization and Destylization for Single Domain Generalization

Brain-inspired semantic data augmentation for multi-style images

Towards Pragmatic Semantic Image Synthesis for Urban Scenes

MixStyle Neural Networks for Domain Generalization and Adaptation

Semantic Image Synthesis via Class-Adaptive Cross-Attention

Style Adaptation for Domain-adaptive Semantic Segmentation

Inter-Class and Inter-Domain Semantic Augmentation for Domain Generalization

Latents2Semantics: Leveraging the Latent Space of Generative Models for Localized Style Manipulation of Face Images

Cross-domain image translation with a novel style-guided diversity loss design

SASSL: Enhancing Self-Supervised Learning via Neural Style Transfer

Style Intervention: How to Achieve Spatial Disentanglement with Style-based Generators?

Augmentation-based Domain Generalization for Semantic Segmentation