Adjusted Logistic Propensity Weighting Methods for Population Inference using Nonprobability Volunteer-Based Epidemiologic Cohorts

Lingxiao Wang,Richard Valliant,Yan Li
DOI: https://doi.org/10.48550/arXiv.2007.02476
2021-02-26
Abstract:Many epidemiologic studies forgo probability sampling and turn to nonprobability volunteer-based samples because of cost, response burden, and invasiveness of biological samples. However, finite population inference is difficult to make from the nonprobability samples due to the lack of population representativeness. Aiming for making inferences at the population level using nonprobability samples, various inverse propensity score weighting (IPSW) methods have been studied with the propensity defined by the participation rate of population units in the nonprobability sample. In this paper, we propose an adjusted logistic propensity weighting (ALP) method to estimate the participation rates for nonprobability sample units. Compared to existing IPSW methods, the proposed ALP method is easy to implement by ready-to-use software while producing approximately unbiased estimators for population quantities regardless of the nonprobability sample rate. The efficiency of the ALP estimator can be further improved by scaling the survey sample weights in propensity estimation. Taylor linearization variance estimators are proposed for ALP estimators of finite population means that account for all sources of variability. The proposed ALP methods are evaluated numerically via simulation studies and empirically using the naïve unweighted National Health and Nutrition Examination Survey III sample, while taking the 1997 National Health Interview Survey as the reference, to estimate the 15-year mortality rates.
Applications,Methodology
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is how to overcome the bias problem caused by insufficient sample representativeness when making population inferences using non - probability samples (such as cohort samples composed of volunteers). Specifically, the author proposes an adjusted logistic propensity - weighted method (Adjusted Logistic Propensity, ALP) to estimate the participation rate of non - probability sample units, thereby reducing bias and improving estimation efficiency. ### Background In epidemiological studies, due to factors such as cost, response burden, and the invasiveness of biological sample collection, researchers often choose to use non - probability samples (such as volunteer samples). However, these samples often do not represent the target population well, resulting in biased estimates obtained from these samples. For example, the all - cause mortality estimate in the UK Biobank is only half of that in the UK population as a whole, indicating that the Biobank sample is not representative in many sociodemographic, physical characteristics, lifestyle, and health - related characteristics. ### Research Objectives In order to make effective population inferences when using non - probability samples, this paper proposes the adjusted logistic propensity - weighted (ALP) method. This method aims to: 1. **Estimate the participation rate of non - probability sample units**: Estimate the participation rate of each non - probability sample unit by fitting a logistic regression model. 2. **Reduce bias**: Compared with the existing inverse propensity score - weighted (IPSW) method, the ALP method can produce approximately unbiased estimates of population quantities, regardless of the participation rate of non - probability samples. 3. **Improve efficiency**: Further improve the estimation efficiency by scaling the survey sample weights. ### Methods 1. **Basic Setup**: - Define a finite population \( F_P=\{1,\cdots,N\} \) and its size \( N \). - Assume that a volunteer non - probability sample \( s_c \) of size \( n_c \) is selected from \( F_P \) through a self - selection mechanism. - Define the participation rate of the non - probability sample as \( \pi_i^{(c)} = P(i\in s_c|F_P)=E_c[\delta_i^{(c)}|y_i,x_i] \), where \( \delta_i^{(c)} \) is the indicator variable for \( s_c \) containing unit \( i \), and \( x_i \) is the vector of self - selection variables. - The corresponding implicit non - probability sample weight is \( w_i = 1/\pi_i^{(c)} \). 2. **Existing Methods**: - **Redesignated Design Weight Method (RDW)**: Estimate the participation rate by fitting a logistic regression model and recalibrate the probability sample weights. - **Chen et al.'s Method (CLW)**: Estimate the participation rate by rewriting the log - likelihood function without having to meet the conditions in the RDW method. 3. **Adjusted Logistic Propensity - Weighted Method (ALP)**: - Construct a pseudo - population \( s_c^*\cup F_P \), where \( s_c^* \) is a copy of \( s_c \). - Model \( p_i \) as a function of \( \pi_i^{(c)} \): \[ p_i=\frac{\pi_i^{(c)}}{1 + \pi_i^{(c)}}\quad\text{or equivalently}\quad\pi_i^{(c)}=\frac{p_i}{1 - p_i} \] - Estimate \( p_i \) by fitting a logistic regression model \( \log\left(\frac{p_i}{1 - p_i}\right)=\beta^T x_i \) and then obtain \( \pi_i^{(c)} \). - Use \( w_i^{ALP}=1/\pi_i^{(c)}(\hat{\beta}) \) as the pseudo - weight to estimate the population mean. ### Results Through simulation studies, the author evaluated the performance of the ALP method in different scenarios. The results show that: