Optimal subsampling for modal regression in massive data

Yue Chao,Lei Huang,Xuejun Ma,Jiajun Sun
DOI: https://doi.org/10.1007/s00184-023-00916-2
IF: 0.96
2023-06-29
Metrika
Abstract:Many modern statistical analysis research efforts are focused on solving the limited computational resources problem that arises when dealing with large datasets. One popular and effective method to address this challenge is to obtain informative subdata from the full dataset based on optimal subsampling probabilities. In this article, we present an optimal subsampling approach for big data modal regression from the perspective of minimizing asymptotic mean squared error. The estimation procedure is achieved by running a two-step algorithm based on the modal expectation-maximization algorithm when the bandwidth for the modal regression is not related to the subsample size. Under certain regularity conditions, we investigate the consistency and asymptotic normality of the subsample-based estimator given the full data. Furthermore, an optimal bandwidth selection approach within this framework is also investigated. Simulation studies demonstrate that our proposed subsampling method performs well in the context of big data modal regression. Empirical evaluation is also conducted using real data.
statistics & probability
What problem does this paper attempt to address?