BiPC: Bidirectional Probability Calibration for Unsupervised Domain Adaption

Wenlve Zhou,Zhiheng Zhou,Junyuan Shang,Chang Niu,Mingyue Zhang,Xiyuan Tao,Tianlei Wang
DOI: https://doi.org/10.1016/j.eswa.2024.125460
2024-09-29
Abstract:Unsupervised Domain Adaptation (UDA) leverages a labeled source domain to solve tasks in an unlabeled target domain. While Transformer-based methods have shown promise in UDA, their application is limited to plain Transformers, excluding Convolutional Neural Networks (CNNs) and hierarchical Transformers. To address this issues, we propose Bidirectional Probability Calibration (BiPC) from a probability space perspective. We demonstrate that the probability outputs from a pre-trained head, after extensive pre-training, are robust against domain gaps and can adjust the probability distribution of the task head. Moreover, the task head can enhance the pre-trained head during adaptation training, improving model performance through bidirectional complementation. Technically, we introduce Calibrated Probability Alignment (CPA) to adjust the pre-trained head's probabilities, such as those from an ImageNet-1k pre-trained classifier. Additionally, we design a Calibrated Gini Impurity (CGI) loss to refine the task head, with calibrated coefficients learned from the pre-trained classifier. BiPC is a simple yet effective method applicable to various networks, including CNNs and Transformers. Experimental results demonstrate its remarkable performance across multiple UDA tasks. Our code will be available at: <a class="link-external link-https" href="https://github.com/Wenlve-Zhou/BiPC" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to improve the adaptability of different network structures (such as CNNs and Transformers) in Unsupervised Domain Adaptation (UDA) and alleviate the problem of model performance degradation due to the domain gap?** Specifically, the existing UDA methods are mainly divided into two categories: feature - alignment methods and Transformer - based methods. However, these methods have the following limitations: 1. **Feature - alignment methods**: - They are mainly based on Convolutional Neural Networks (CNNs) and have limited effectiveness when applied to new architectures such as Transformers. - These methods usually do not consider the distribution matching between classes, which limits the performance of UDA methods. 2. **Transformer - based methods**: - Although they have shown strong potential, they are only applicable to pure Transformer models and cannot be directly applied to CNNs or hierarchical Transformers. To solve these problems, the authors propose the **Bidirectional Probability Calibration (BiPC)** method. This method, from the perspective of the probability space, introduces the **Calibrated Probability Alignment (CPA)** and **Calibrated Gini Impurity (CGI)** loss functions to achieve bidirectional complementarity between the pre - training head and the task head, thereby improving the model's performance on the target domain. ### Key contributions 1. **Propose a simple and effective UDA method based on the probability space**, which can alleviate the model performance degradation caused by the domain gap. 2. **Introduce the calibrated probability alignment loss**, which aligns the probability distribution by calculating the calibration coefficient from the source label and the target pseudo - label. 3. **Design the calibrated Gini impurity loss** for pseudo - label learning, where the calibration coefficient is learned from the probability space of the pre - training head. 4. **Extensive experiments show that BiPC provides significant improvements on various backbone networks** and achieves the state - of - the - art (SoTA) performance in the Partial - set Domain Adaptation (PDA) task. Through this method, BiPC can not only effectively improve the UDA performance of different network structures, but also achieve a better balance between the feature space and the probability space, thereby improving the generalization ability of the model.