Two-stage Communication-Efficient Distributed Sparse M-estimation with Missing Data

Xudong Zhang,Ting Zhang,Lei Wang
DOI: https://doi.org/10.1080/02331888.2023.2201505
IF: 2.346
2023-01-01
Statistics
Abstract:Distributed estimation based on different sources of observations has drawn attention in the modern statistical learning. When the distributed data are missing at random, we propose a two-stage L-1-penalized communication-efficient surrogate likelihood (CSL) algorithm based on inverse probability weighting to eliminate the estimation bias caused by the missing data and construct sparse distributed M-estimator simultaneously. In the first stage, we consider a parametric propensity model and directly apply the L-1-penalized CSL method to obtain an efficient and sparse distributed estimator of the propensity parameter. In the second stage, we construct an IPWbased L-1-penalized CSL loss function to eliminate the bias and obtain the sparse M-estimation. The finite-sample performance of the estimators is studied through simulation, and an application to house sale prices data set is also presented.
What problem does this paper attempt to address?