Enhancing Multi-Source Open-Set Domain Adaptation through Nearest Neighbor Classification with Self-Supervised Vision Transformer
Jing Li,Liu Yang,Qinghua Hu
DOI: https://doi.org/10.1109/tcsvt.2023.3307789
IF: 5.859
2023-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Domain adaptation mitigates the decline in performance that occurs when models are utilized in a target domain. Models designed for a limited range of categories struggle to handle real-world scenarios where unknown classes, absent from the original domain, exist. Furthermore, it is probable that multiple source domains are annotated asynchronously by distinct agencies, each with its own data distributions. The practical challenges of multi-source open-set domain adaptation (MSOSDA) have not been thoroughly investigated, despite their relevance in real-world scenarios. The main difficulty in MSOSDA lies in developing a shared discriminative feature space across all domains, while effectively separating source classes from target-specific ones. In this study, we propose a method for MSOSDA using a self-supervised vision Transformer (ViT) combined with nearest neighbor classification. Our key insight is to leverage the powerful nearest neighbor classification property of self-supervised ViT, along with supervised contrastive learning. To explicitly align the domains and accurately identify unknown classes in the target domain, we employ straightforward strategies and an adaptive data-driven threshold. Our approach has been extensively evaluated on five multi-source domain adaptation benchmarks, showcasing its effectiveness. Among these benchmarks, two are fine-grained, and it is worth noting that one of them has been introduced for the first time in this paper. Through these experiments, we provide compelling evidence of the performance and efficacy of our proposed approach.
engineering, electrical & electronic