Solving the missing at random problem in semi‐supervised learning: An inverse probability weighting method

Jin Su,Shuyi Zhang,Yong Zhou
DOI: https://doi.org/10.1002/sta4.707
2024-06-24
Stat
Abstract:We propose an estimator for the population mean θ0=E(Y) under the semi‐supervised learning setting with the Missing at Random (MAR) assumption. This setting assumes that the probability of observing Y , denoted by πM∗ , depends on the total sample size M and satisfies πM∗=o(1) . To efficiently estimate θ0 , we introduce an adaptive estimator based on inverse probability weighting and cross‐fitting. Theoretical analysis reveals that our proposed estimator is consistent and efficient, with a convergence rate of MπM∗ , slower than the typical M rate, due to the diminishing proportion of labelled data as the sample size M increases in the semi‐supervised setting. We also prove the consistency of inverse probability weighting (IPW)–Nadaraya–Watson density function estimators. Extensive simulations and an application to the Los Angeles homeless data validate the effectiveness of our approach.
statistics & probability
What problem does this paper attempt to address?