Learning cross-domain representations by vision transformer for unsupervised domain adaptation

Yifan Ye,Shuai Fu,Jing Chen
DOI: https://doi.org/10.1007/s00521-023-08269-7
2023-01-27
Neural Computing and Applications
Abstract:Unsupervised Domain Adaptation (UDA) is a popular machine learning technique to reduce the distribution discrepancy among domains. Generally, most UDA methods utilize a deep Convolutional Neural Networks (CNNs) and a domain discriminator to learn a domain-invariant representation, but it does not equal to a discriminative domain-specific representation. Transformers (TRANS), which has been proved to be more robust to domain shift than CNNs, has gradually become a powerful alternative to CNNs in feature representation. On the other hand, the domain shift between the labeled source data and the unlabeled target data produces a significant amount of label noise, which needs a more robust connection between the source and target domain. This report proposes a simple yet effective UDA method for learning cross-domain representations by vision Transformers in a self-training manner. Unlike the conventional form of dividing an image into multiple non-overlapping patches, we proposed a novel method that aggregates both source domain labeled patches and target domain pseudo-labeled target patches. In addition, a cross-domain alignment loss is proposed to match the centroid of labeled source patches and pseudo-labeled target patches. Extensive experiments show that our proposed method achieves state-of-the-art (SOTA) results on several standard UDA benchmarks (90.5 on ImageCLEF-DA, Office-31) by a transformers baseline model without any extra assistant networks.
computer science, artificial intelligence
What problem does this paper attempt to address?