Abstract:Domain generalization (DG) aims to learn a model that generalizes well to unseen target domains utilizing multiple source domains without re-training. Most existing DG works are based on convolutional neural networks (CNNs). However, the local operation of the convolution kernel makes the model focus too much on local representations (e.g., texture), which inherently causes the model more prone to overfit to the source domains and hampers its generalization ability. Recently, several MLP-based methods have achieved promising results in supervised learning tasks by learning global interactions among different patches of the image. Inspired by this, in this paper, we first analyze the difference between CNN and MLP methods in DG and find that MLP methods exhibit a better generalization ability because they can better capture the global representations (e.g., structure) than CNN methods. Then, based on a recent lightweight MLP method, we obtain a strong baseline that outperforms most state-of-the-art CNN-based methods. The baseline can learn global structure representations with a filter to suppress structure irrelevant information in the frequency space. Moreover, we propose a dynAmic LOw-Frequency spectrum Transform (ALOFT) that can perturb local texture features while preserving global structure features, thus enabling the filter to remove structure-irrelevant information sufficiently. Extensive experiments on four benchmarks have demonstrated that our method can achieve great performance improvement with a small number of parameters compared to SOTA CNN-based DG methods. Our code is available at <a class="link-external link-https" href="https://github.com/lingeringlight/ALOFT/" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The paper attempts to address the problem of how to construct a model that can learn from multiple source domain data and perform well in unseen target domains in the task of Domain Generalization (DG). Specifically, the paper points out that most existing DG methods are based on Convolutional Neural Networks (CNNs), but due to the locality of convolution operations, these models tend to overly focus on local features (such as textures), leading to overfitting to the source domains and poor generalization ability in unseen target domains. To overcome this drawback, the authors propose a new lightweight MLP-like architecture, namely the dynAmic LOw-Frequency spectrum TransForm (ALOFT) method. This method improves the model's generalization ability by perturbing local texture features while preserving global structural features. The main contributions of the paper include: 1. **Frequency Perspective Analysis**: The authors analyze the working principle of MLP-like methods in DG tasks from a frequency perspective and find that MLP-like methods can better utilize global structural information, thus having better generalization ability. 2. **Lightweight MLP-like Architecture**: A lightweight MLP-like architecture is proposed, which can significantly improve the model's performance while maintaining a small network size. 3. **Dynamic Low-Frequency Transform (ALOFT)**: Two variants (ALOFT-E and ALOFT-S) are designed to model the distribution of low-frequency spectra at the element level and statistical level, respectively, to simulate potential domain shifts and further enhance the model's ability to capture global representations. Through these innovations, the proposed method achieves significant performance improvements on four standard domain generalization benchmark datasets, especially in scenarios with fewer parameters.

ALOFT: A Lightweight MLP-like Architecture with Dynamic Low-frequency Transform for Domain Generalization

FAMLP: A Frequency-Aware MLP-Like Architecture For Domain Generalization

Generalizable Representation Learning for Mixture Domain Face Anti-Spoofing

MLAAN: Scaling Supervised Local Learning with Multilaminar Leap Augmented Auxiliary Network

Domain generalization based on domain-specific adversarial learning

It takes two: Dual Branch Augmentation Module for domain generalization

DomainDrop: Suppressing Domain-Sensitive Channels for Domain Generalization

START: A Generalized State Space Model with Saliency-Driven Token-Aware Transformation

AFA: adversarial frequency alignment for domain generalized lung nodule detection

Deep Domain-Adversarial Image Generation for Domain Generalisation

DGMamba: Domain Generalization via Generalized State Space Model

Adaptive Domain Generalization via Online Disagreement Minimization

Multi-Scale and Multi-Layer Contrastive Learning for Domain Generalization

Dynamic MLP for Fine-Grained Image Classification by Leveraging Geographical and Temporal Information

Towards Unified and Effective Domain Generalization

Learning Generalizable Models via Disentangling Spurious and Enhancing Potential Correlations

DLO: Dynamic Layer Operation for Efficient Vertical Scaling of LLMs

Domain Generalization for Domain-Linked Classes

MLDGG: Meta-Learning for Domain Generalization on Graphs

Learn to Preserve and Diversify: Parameter-Efficient Group with Orthogonal Regularization for Domain Generalization