LAGUNA: LAnguage Guided UNsupervised Adaptation with structured spaces

Anxhelo Diko,Antonino Furnari,Luigi Cinque,Giovanni Maria Farinella
2024-11-23
Abstract:Unsupervised domain adaptation remains a critical challenge in enabling the knowledge transfer of models across unseen domains. Existing methods struggle to balance the need for domain-invariant representations with preserving domain-specific features, which is often due to alignment approaches that impose the projection of samples with similar semantics close in the latent space despite their drastic domain differences. We introduce \mnamelong, a novel approach that shifts the focus from aligning representations in absolute coordinates to aligning the relative positioning of equivalent concepts in latent spaces. \mname defines a domain-agnostic structure upon the semantic/geometric relationships between class labels in language space and guides adaptation, ensuring that the organization of samples in visual space reflects reference inter-class relationships while preserving domain-specific characteristics. %We empirically demonstrate \mname's superiority in domain adaptation tasks across four diverse images and video datasets. Remarkably, \mname surpasses previous works in 18 different adaptation scenarios across four diverse image and video datasets with average accuracy improvements of +3.32% on DomainNet, +5.75% in GeoPlaces, +4.77% on GeoImnet, and +1.94% mean class accuracy improvement on EgoExo4D.
Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the key challenges in **Unsupervised Domain Adaptation (UDA)**. Specifically, existing methods have difficulties in balancing **domain - invariant representations** and **preserving domain - specific features**. This is usually because the existing alignment methods force samples with similar semantics to be projected to close positions in the latent space, even though they have significant differences across different domains. #### Main problems: 1. **Differences in domain distributions**: The core problem in UDA is to train a model to generalize in the case of different data distributions. When the data distributions of the source domain and the target domain are different, the performance of the model on the target domain will decline. 2. **Limitations of absolute coordinate alignment**: Traditional UDA methods attempt to align the representation space by minimizing the distribution differences between the source domain and the target domain. However, this method may lead to an overly general representation space and lose the domain - specific detailed features. 3. **Importance of relative positions**: Existing methods ignore the alignment of the relative positions of equivalent concepts in the latent space and focus more on the alignment of absolute coordinates. #### LAGUNA's solutions: LAGUNA (LAnguage Guided UNsupervised Adaptation with structured spaces) proposes a new method that shifts the focus from aligning absolute coordinates to aligning **the relative positions of equivalent concepts in the latent space**. Specifically, LAGUNA solves the problems in the following ways: - **Define domain - independent structures**: Define a domain - independent structure based on the semantic/geometric relationships of class labels in the language space. - **Guided adaptation**: Ensure that the organization of samples in the visual space reflects the inter - class relationships of the reference while preserving domain - specific features. - **Three - stage method**: - **Stage 1**: Construct a language - guided reference structure. - **Stage 2**: Train a language model to provide pseudo - labels for the unlabeled target domain and ensure that the internal text embeddings are aligned with the domain - independent structure defined in Stage 1. - **Stage 3**: Train a cross - domain visual classifier, learn domain - specific action anchors, and make its structure aligned with the domain - independent structure defined in Stage 1. Through this method, LAGUNA can outperform existing methods in multiple different domain adaptation scenarios, with a significant improvement in average accuracy. For example, on four datasets, namely DomainNet, GeoPlaces, GeoImnet, and EgoExo4D, LAGUNA achieved average accuracy improvements of +3.32%, +5.75%, +4.77%, and +1.94% respectively. ### Summary The main contributions of LAGUNA are: 1. Research on the advantages of using relative representations for UDA. 2. Propose LAGUNA, a three - stage method, to learn cross - domain classifiers, making the spaces of the source domain and the target domain both independent and aligned to a language - derived reference structure. 3. Prove the superiority of this method through extensive ablation experiments and comparison with the existing state - of - the - art models. LAGUNA effectively solves the problem of domain distribution differences in UDA by introducing the method of relative position alignment while preserving domain - specific features.