Abstract:Most current domain adaptation methods address either covariate shift or label shift, but are not applicable where they occur simultaneously and are confounded with each other. Domain adaptation approaches which do account for such confounding are designed to adapt covariates to optimally predict a particular label whose shift is confounded with covariate shift. In this paper, we instead seek to achieve general-purpose data backwards compatibility. This would allow the adapted covariates to be used for a variety of downstream problems, including on pre-existing prediction models and on data analytics tasks. To do this we consider a modification of generalized label shift (GLS), which we call confounded shift. We present a novel framework for this problem, based on minimizing the expected divergence between the source and target conditional distributions, conditioning on possible confounders. Within this framework, we provide concrete implementations using the Gaussian reverse Kullback-Leibler divergence and the maximum mean discrepancy. Finally, we demonstrate our approach on synthetic and real datasets.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the domain adaptation problem when covariate shift and label shift occur simultaneously and are confounded with each other. Most of the existing domain adaptation methods can only handle one of these two shifts, but are unable to deal with the situation where they exist simultaneously and influence each other. Specifically, the author proposes a new framework to achieve general - purpose data backwards compatibility. This means that the adapted covariates can be used for multiple downstream tasks, including existing prediction models and data analysis tasks. To this end, the author introduces a modified version of the generalized label shift (GLS), called **confounded shift**. Under this framework, the author adapts the data by minimizing the expected divergence between the conditional distributions of the source and target domains (conditioned on possible confounders). ### Main contributions 1. **The concept of Confounded Shift**: The author proposes a new concept, namely confounded shift, which allows different covariate and label distributions between the source and target domains, but assumes that the conditional distribution can be made the same as that of the source domain by adapting the target covariates. 2. **New framework**: Based on minimizing the expected divergence between the conditional distributions of the source and target domains, the author provides a new framework and gives specific implementation methods, including using Gaussian reverse Kullback - Leibler divergence (Gaussian reverse KLD) and maximum mean discrepancy (MMD) as divergence functions. 3. **Application scenarios**: The author shows the application effects of this method on synthetic datasets and real - world datasets, especially in biomedical fields such as EEG data. ### Summary of mathematical formulas - **Gaussian reverse Kullback - Leibler divergence (Gaussian reverse KLD)**: \[ d_{\text{reverse - KLD}}(P, Q)=d_{\text{KL}}(Q \| P) \] where \(P\) and \(Q\) are the conditional distributions of the source and target domains respectively. - **Maximum mean discrepancy (MMD)**: \[ \text{MMD}^2(D_T, D_S)=\mathbb{E}_{x_1, x_1' \sim D_T} k_X(x_1, x_1')- 2\mathbb{E}_{x_1 \sim D_T, x_2 \sim D_S} k_X(x_1, A x_2 + b)+\mathbb{E}_{x_2, x_2' \sim D_S} k_X(A x_2 + b, A x_2' + b) \] ### Conclusion This paper solves the problem that existing domain adaptation methods cannot handle the situation where covariate shift and label shift exist simultaneously and are confounded with each other by introducing the concept of confounded shift and a novel framework. This provides a new solution for achieving general - purpose data backwards compatibility, making the adapted data applicable to multiple downstream tasks.

Towards Backwards-Compatible Data with Confounded Domain Adaptation

Trust-aware Conditional Adversarial Domain Adaptation with Feature Norm Alignment.

Hidden Covariate Shift: A Minimal Assumption For Domain Adaptation

Learning When the Concept Shifts: Confounding, Invariance, and Dimension Reduction

Adaptive Risk Minimization: Learning to Adapt to Domain Shift

Single-source domain adaptation with target and conditional shift

A Novel Domain Adaptation Theory with Jensen–Shannon Divergence

Beyond H-Divergence: Domain Adaptation Theory With Jensen-Shannon Divergence.

Embracing the disharmony in medical imaging: A Simple and effective framework for domain adaptation

Domain Adaptation for Time-Series Classification to Mitigate Covariate Shift

An introduction to domain adaptation and transfer learning

Beyond $\mathcal{H}$-Divergence: Domain Adaptation Theory With Jensen-Shannon Divergence

Align and Adapt: A Two-stage Adaptation Framework for Unsupervised Domain Adaptation

Label-Noise Robust Domain Adaptation

Beyond Invariance: Test-Time Label-Shift Adaptation for Distributions with "Spurious" Correlations

Proxy Methods for Domain Adaptation

Homologous Component Analysis for Domain Adaptation.

Domain Generalization via Causal Adjustment for Cross-Domain Sentiment Analysis

CDA: Contrastive-adversarial Domain Adaptation

Progressive Conservative Adaptation for Evolving Target Domains

On Learning Invariant Representation for Domain Adaptation