Abstract:Consider a scenario where we have access to train data with both covariates and outcomes while test data only contains covariates. In this scenario, our primary aim is to predict the missing outcomes of the test data. With this objective in mind, we train parametric regression models under a covariate shift, where covariate distributions are different between the train and test data. For this problem, existing studies have proposed covariate shift adaptation via importance weighting using the density ratio. This approach averages the train data losses, each weighted by an estimated ratio of the covariate densities between the train and test data, to approximate the test-data risk. Although it allows us to obtain a test-data risk minimizer, its performance heavily relies on the accuracy of the density ratio estimation. Moreover, even if the density ratio can be consistently estimated, the estimation errors of the density ratio also yield bias in the estimators of the regression model's parameters of interest. To mitigate these challenges, we introduce a doubly robust estimator for covariate shift adaptation via importance weighting, which incorporates an additional estimator for the regression function. Leveraging double machine learning techniques, our estimator reduces the bias arising from the density ratio estimation errors. We demonstrate the asymptotic distribution of the regression parameter estimator. Notably, our estimator remains consistent if either the density ratio estimator or the regression function is consistent, showcasing its robustness against potential errors in density ratio estimation. Finally, we confirm the soundness of our proposed method via simulation studies.

What problem does this paper attempt to address?

### The problem that the paper attempts to solve This paper aims to solve the problem of predicting missing outcomes of test data in the case of covariate shift. Specifically, the research background is how to accurately predict the missing outcomes of test data when the training data contains both covariates and outcomes, while the test data only contains covariates. ### Main challenges 1. **Different covariate distributions**: The covariate distributions of the training data and the test data are different, which makes it difficult to directly use the risk estimates of the training data to predict the results of the test data. 2. **Error in density ratio estimation**: Existing methods usually adapt to covariate shift through importance weighting, which depends on the estimation of the density ratio. However, the error in density ratio estimation will lead to a bias in the parameter estimation of the regression model. 3. **Model misspecification**: When the regression model is misspecified, the prediction performance may be severely deteriorated. ### Solutions To address the above challenges, this paper proposes a doubly robust (DR) covariate - shift adaptation method. The main features of this method are as follows: 1. **Double robustness**: Even if one of the density ratio estimation or the conditional expectation is inconsistent, as long as the other is consistent, the DR estimator is still consistent. 2. **Bias reduction**: By applying the double machine learning (DML) technique, this method can reduce the bias caused by the error in density ratio estimation. 3. **Asymptotic normality**: This method can prove the asymptotic normality of the regression parameter estimation, thereby achieving a faster \(\sqrt{n}\)-convergence rate. ### Method overview 1. **Importance weighting**: Estimate the density ratio \(r_0(x)=\frac{q(x)}{p(x)}\) and weight the loss of the training data to approximate the risk of the test data. 2. **Density ratio estimation**: Use methods such as Least - Squares Importance Fitting (LSIF) to estimate the density ratio. 3. **Doubly robust estimator**: Combine density ratio estimation and conditional expectation estimation to construct a doubly robust estimator and reduce the bias caused by the error in density ratio estimation. 4. **Self - debiased estimator**: In the case where other models cannot be used to estimate the conditional expectation, a self - debiased (SDB) estimator is proposed, which only uses the model of interest and the density ratio model. ### Experimental verification Through simulation experiments, this paper verifies the effectiveness of the proposed method. The experimental results show that, compared with ordinary least squares (OLS), weighted least squares (WLS) and non - parametric regression (NP), the DR method performs better in the case of model misspecification and covariate shift. ### Conclusion The doubly robust covariate - shift adaptation method proposed in this paper can provide more accurate prediction results in the case of model misspecification and covariate shift, and has good theoretical properties.

Double Debiased Covariate Shift Adaptation Robust to Density-Ratio Estimation

Robust Covariate Shift Adaptation for Density-Ratio Estimation

Adapting to Continuous Covariate Shift via Online Density Ratio Estimation

Double-Weighting for Covariate Shift Adaptation

A One-step Approach to Covariate Shift Adaptation

Discriminative Density-ratio Estimation

Nearest Neighbor Sampling for Covariate Shift Adaptation

Doubly robust calibration of prediction sets under covariate shift

Efficient and Multiply Robust Risk Estimation under General Forms of Dataset Shift

Covariate-Shift Generalization Via Random Sample Weighting.

Optimally tackling covariate shift in RKHS-based nonparametric regression

Transfer Learning under Covariate Shift: Local $k$-Nearest Neighbours Regression with Heavy-Tailed Design

Maximum Likelihood Estimation is All You Need for Well-Specified Covariate Shift

Beyond Reweighting: On the Predictive Role of Covariate Shift in Effect Generalization

Inference for High-Dimensional Linear Expectile Regression with De-Biasing Method

The covariate-adjusted residual estimator and its use in both randomized trials and observational settings

Estimating the Density Ratio between Distributions with High Discrepancy using Multinomial Logistic Regression

Treatment effect estimation under covariate-adaptive randomization with heavy-tailed outcomes

Distributionally Robust Safe Sample Elimination under Covariate Shift

Estimation of prediction error with known covariate shift

Conformal Predictive Systems Under Covariate Shift