Inference for Non-Probability Samples under High-Dimensional Covariate-Adjusted Superpopulation Model

Pan Yingli,Cai Wen,Liu Zhan
DOI: https://doi.org/10.1007/s10260-021-00619-w
2022-01-01
Statistical Methods & Applications
Abstract:Non-probability samples become increasingly popular in sampling survey with lower costs, shorter time durations and higher efficiencies. In the high-dimensional superpopulation modeling approach for non-probability samples, a model is fitted for the analysis variable from a non-probability sample, and is used to project the sample to the full population. In practice, there exist situations that the covariates in modeling process are not directly observed, but are contaminated with a multiplicative factor that is determined by the value of an unknown function of an observable confounder. In the paper, we propose to calibrate the covariates by nonparametrically regressing the observable contaminated covariate on the confounder. We employ the SCAD-penalized least squares method to investigate the variable selection and inference problems for non-probability samples based on the calibrated covariates. A SCAD-penalized estimator for the parameter and the population mean estimator are obtained. Under some mild assumptions, we establish the "oracle property" of the proposed SCAD-penalized estimator and give the consistency properties of the proposed population mean estimator. Simulation studies are conducted to assess the finite-sample performance of the proposed method. An application to a Boston housing price study demonstrates the utility of the proposed method in practice.
What problem does this paper attempt to address?