Semi-supervised learning for various comparison functions across two populations

Zhang, Menghua
DOI: https://doi.org/10.1007/s00362-024-01632-3
2024-12-14
Statistical Papers
Abstract:Estimating comparison functions is crucial in numerous domains, such as econometrics, clinical medicine, and public health, where evaluating the effectiveness of interventions or treatment effects is a central concern. While the response variables are much more expensive to collect than the covariates in many scenarios, to tackle the challenge of limited labeled data, we present a unified semi-supervised learning (SSL) framework to estimate comparison functions, like the difference between two independent samples in means, probabilities for events, the survival competition probability, by leveraging the information of unlabelled data with only covariate observations to improve estimation accuracy. Specifically, a class of efficient and adaptive estimators for comparison functions is proposed to effectively utilize both the labeled data and unlabelled data under the semi-supervised (SS) framework. We establish the consistency and asymptotic normality of the proposed estimators and provide the optimal weight yielding the most efficient estimator. Furthermore, the resulting estimator is shown to be semiparametric efficient if the working model is correctly specified. Extensive numerical simulations are conducted to confirm the consistency and efficiency of our proposed estimators. An application to a real data extracted from the 2001 Medical Expenditures Panel Survey (MEPS) is also included.
statistics & probability
What problem does this paper attempt to address?