Robust optimal subsampling based on weighted asymmetric least squares

Min Ren,Shengli Zhao,Mingqiu Wang,Xinbei Zhu
DOI: https://doi.org/10.1007/s00362-023-01480-7
2023-09-20
Statistical Papers
Abstract:With the development of contemporary science, a large amount of generated data includes heterogeneity and outliers in the response and/or covariates. Furthermore, subsampling is an effective method to overcome the limitation of computational resources. However, when data include heterogeneity and outliers, incorrect subsampling probabilities may select inferior subdata, and statistic inference on this subdata may have a far inferior performance. Combining the asymmetric least squares and estimation, this paper proposes a double-robustness framework (DRF), which can simultaneously tackle the heterogeneity and outliers in the response and/or covariates. The Poisson subsampling is implemented based on the DRF for massive data, and a more robust probability will be derived to select the subdata. Under some regularity conditions, we establish the asymptotic properties of the subsampling estimator based on the DRF. Numerical studies and actual data demonstrate the effectiveness of the proposed method.
statistics & probability
What problem does this paper attempt to address?