Statistical Inference with Nonignorable Non-Probability Survey Samples

Yang Liu,Meng Yuan,Pengfei Li,Changbao Wu
2024-10-04
Abstract:Statistical inference with non-probability survey samples is an emerging topic in survey sampling and official statistics and has gained increased attention from researchers and practitioners in the field. Much of the existing literature, however, assumes that the participation mechanism for non-probability samples is ignorable. In this paper, we develop a pseudo-likelihood approach to estimate participation probabilities for nonignorable non-probability samples when auxiliary information is available from an existing reference probability sample. We further construct three estimators for the finite population mean using regression-based prediction, inverse probability weighting (IPW), and augmented IPW estimators, and study their asymptotic properties. Variance estimation for the proposed methods is considered within the same framework. The efficiency of our proposed methods is demonstrated through simulation studies and a real data analysis using the ESPACOV survey on the effects of the COVID-19 pandemic in Spain.
Methodology,Statistics Theory
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to handle the non - ignorable participation mechanism when making statistical inferences in non - probability samples. Specifically, when auxiliary information can be obtained from existing reference probability samples, the paper proposes a pseudo - likelihood method to estimate the participation probability in non - probability samples and further constructs three methods for finite population mean estimation: the regression - based prediction estimator, the inverse probability weighting (IPW) estimator, and the augmented IPW (AIPW) estimator. In addition, the paper also studies the asymptotic properties of these estimators and considers the problem of variance estimation within the same framework. ### Background and Motivation Traditional probability sampling methods are regarded as the gold standard in survey sampling. However, with the advent of the big data era, non - probability samples (such as web - panel surveys, administrative records, etc.) have become more and more common due to their convenience and low cost. However, the main challenge of non - probability samples lies in the unknown participation mechanism, which may lead to selection bias and thus affect the validity of estimation and inference. Most of the existing literature assumes that the participation mechanism of non - probability samples is ignorable, that is, the participation probability does not depend on the response variable of interest given the observed covariates. But in practical applications, this assumption is often not valid. For example, when studying the impact of the COVID - 19 pandemic on people's emotions, evidence shows that a good mood is positively correlated with the enthusiasm for participating in the survey. ### Main Contributions 1. **Model Identification Conditions**: Established model identification conditions under two assumptions similar to those of Kim and Morikawa (2023). 2. **Pseudo - Likelihood Method**: Proposed a new pseudo - likelihood method to estimate the participation probability under the assumed non - ignorable participation mechanism. 3. **Estimator Development**: Developed population mean estimators based on regression, IPW, and AIPW, and solved the variance estimation problem of these estimators. ### Methods and Results - **Parameter Identification**: Discussed the identification problem of model parameters and proposed a pseudo - likelihood method to estimate the parameters. - **Estimator Performance**: Demonstrated the performance of the proposed estimators through simulation studies and actual data analysis (using ESPACOV survey data). The results show that when the participation mechanism is non - ignorable, the proposed estimators significantly reduce the bias and perform well in various settings. - **Variance Estimation**: Proposed a plug - in variance estimator and verified its validity under different sample sizes. ### Conclusion The methods proposed in the paper perform well in handling the non - ignorable participation mechanism in non - probability samples, and can effectively reduce bias and improve the accuracy of estimation. These methods are of great significance in practical applications, especially when dealing with non - probability samples from big data sources.