Sampling Importance Resampling Algorithm with Nonignorable Missing Response Variable Based on Smoothed Quantile Regression

Jingxuan Guo,Fuguo Liu,Wolfgang Karl Härdle,Xueliang Zhang,Kai Wang,Ting Zeng,Liping Yang,Maozai Tian
DOI: https://doi.org/10.3390/math11244906
IF: 2.4
2023-12-08
Mathematics
Abstract:The presence of nonignorable missing response variables often leads to complex conditional distribution patterns that cannot be effectively captured through mean regression. In contrast, quantile regression offers valuable insights into the conditional distribution. Consequently, this article places emphasis on the quantile regression approach to address nonrandom missing data. Taking inspiration from fractional imputation, this paper proposes a novel smoothed quantile regression estimation equation based on a sampling importance resampling (SIR) algorithm instead of nonparametric kernel regression methods. Additionally, we present an augmented inverse probability weighting (AIPW) smoothed quantile regression estimation equation to reduce the influence of potential misspecification in a working model. The consistency and asymptotic normality of the empirical likelihood estimators corresponding to the above estimating equations are proven under the assumption of a correctly specified parameter working model. Furthermore, we demonstrate that the AIPW estimation equation converges to an IPW estimation equation when a parameter working model is misspecified, thus illustrating the robustness of the AIPW estimation approach. Through numerical simulations, we examine the finite sample properties of the proposed method when the working models are both correctly specified and misspecified. Furthermore, we apply the proposed method to analyze HIV—CD4 data, thereby exploring variations in treatment effects and the influence of other covariates across different quantiles.
mathematics
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to address the challenges of non - ignorable missing response variables in statistical analysis. Specifically, when there are non - random missing values in the data, traditional mean regression methods cannot effectively capture the complex patterns of the conditional distribution. Quantile regression, on the other hand, can provide valuable insights into the conditional distribution. Therefore, this paper focuses on how to use the quantile regression method to handle non - random missing data. ### Main contributions 1. **Propose a new smoothed quantile regression estimating equation**: - Based on the sampling importance resampling (SIR) algorithm, rather than the non - parametric kernel regression method, a new smoothed quantile regression estimating equation is proposed. - This method can reduce the impact of potential model misspecification when dealing with non - random missing data. 2. **Introduce the augmented inverse probability weighted (AIPW) smoothed quantile regression estimating equation**: - By introducing the AIPW method, the impact of working model misspecification is further reduced, and the robustness of the estimate is improved. 3. **Proof of theoretical properties**: - Prove the consistency and asymptotic normality of the empirical likelihood estimator under the assumption that the working model is correctly specified. - Demonstrate that when the working model is misspecified, the AIPW estimating equation converges to the IPW estimating equation, thus showing the robustness of the AIPW estimation method. 4. **Numerical simulation and practical application**: - Through numerical simulation, the finite - sample properties of the proposed method under the correct and misspecified working models are examined. - The proposed method is applied to the HIV - CD4 data set to explore the changes in treatment effects at different quantiles and the influence of other covariates. ### Key formulas 1. **Linear quantile regression model**: \[ Y_i = Z_i^{\top}\theta_{\tau}+\epsilon_i, \quad i = 1,\ldots,n \] where \(Y_i\) is the response variable, \(Z_i\) is a completely observed \(q\)-dimensional covariate vector, \(\theta_{\tau}\) is an unknown vector of regression coefficients, \(\epsilon_i\) is a random error term satisfying \(P(\epsilon_i\leq0|Z_i)=\tau\), and \(\tau\in(0, 1)\). 2. **Quantile regression estimating equation**: \[ \hat{\theta}=\arg\min_{\theta\in\Theta}\frac{1}{n}\sum_{i = 1}^n\rho_{\tau}(Y_i - Z_i^{\top}\theta) \] where \(\rho_{\tau}(u)=u(\tau - I(u < 0))\) is the check function and \(I(\cdot)\) is the indicator function. 3. **SIR - based smoothed quantile regression estimating equation**: \[ \psi_{eei}^h(Y_i, Z_i,\delta_i;\theta,\beta,\gamma)=\delta_i\psi_h(Z_i, Y_i;\theta)+(1 - \delta_i)\frac{1}{M}\sum_{j = 1}^M\psi_h(Z_i, Y_i^{(j)}(\beta,\gamma);\theta) \] 4. **AIPW smoothed quantile regression estimating equation**: \[ \psi_{aipw}^h(Y_i, Z_i,\delta_i;\theta,\beta,\gamma)=\fr