ALOFT: A Lightweight MLP-like Architecture with Dynamic Low-frequency Transform for Domain Generalization

Jintao Guo,Na Wang,Lei Qi,Yinghuan Shi
2023-03-31
Abstract:Domain generalization (DG) aims to learn a model that generalizes well to unseen target domains utilizing multiple source domains without re-training. Most existing DG works are based on convolutional neural networks (CNNs). However, the local operation of the convolution kernel makes the model focus too much on local representations (e.g., texture), which inherently causes the model more prone to overfit to the source domains and hampers its generalization ability. Recently, several MLP-based methods have achieved promising results in supervised learning tasks by learning global interactions among different patches of the image. Inspired by this, in this paper, we first analyze the difference between CNN and MLP methods in DG and find that MLP methods exhibit a better generalization ability because they can better capture the global representations (e.g., structure) than CNN methods. Then, based on a recent lightweight MLP method, we obtain a strong baseline that outperforms most state-of-the-art CNN-based methods. The baseline can learn global structure representations with a filter to suppress structure irrelevant information in the frequency space. Moreover, we propose a dynAmic LOw-Frequency spectrum Transform (ALOFT) that can perturb local texture features while preserving global structure features, thus enabling the filter to remove structure-irrelevant information sufficiently. Extensive experiments on four benchmarks have demonstrated that our method can achieve great performance improvement with a small number of parameters compared to SOTA CNN-based DG methods. Our code is available at <a class="link-external link-https" href="https://github.com/lingeringlight/ALOFT/" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The paper attempts to address the problem of how to construct a model that can learn from multiple source domain data and perform well in unseen target domains in the task of Domain Generalization (DG). Specifically, the paper points out that most existing DG methods are based on Convolutional Neural Networks (CNNs), but due to the locality of convolution operations, these models tend to overly focus on local features (such as textures), leading to overfitting to the source domains and poor generalization ability in unseen target domains. To overcome this drawback, the authors propose a new lightweight MLP-like architecture, namely the dynAmic LOw-Frequency spectrum TransForm (ALOFT) method. This method improves the model's generalization ability by perturbing local texture features while preserving global structural features. The main contributions of the paper include: 1. **Frequency Perspective Analysis**: The authors analyze the working principle of MLP-like methods in DG tasks from a frequency perspective and find that MLP-like methods can better utilize global structural information, thus having better generalization ability. 2. **Lightweight MLP-like Architecture**: A lightweight MLP-like architecture is proposed, which can significantly improve the model's performance while maintaining a small network size. 3. **Dynamic Low-Frequency Transform (ALOFT)**: Two variants (ALOFT-E and ALOFT-S) are designed to model the distribution of low-frequency spectra at the element level and statistical level, respectively, to simulate potential domain shifts and further enhance the model's ability to capture global representations. Through these innovations, the proposed method achieves significant performance improvements on four standard domain generalization benchmark datasets, especially in scenarios with fewer parameters.