Adaptive Distributed Inference for Multi-source Massive Heterogeneous Data

Xin Yang,Qi Jing Yan,Mi Xia Wu
DOI: https://doi.org/10.1007/s10114-024-2524-4
2024-01-01
Abstract:In this paper, we consider the distributed inference for heterogeneous linear models with massive datasets. Noting that heterogeneity may exist not only in the expectations of the subpopulations, but also in their variances, we propose the heteroscedasticity-adaptive distributed aggregation (HADA) estimation, which is shown to be communication-efficient and asymptotically optimal, regardless of homoscedasticity or heteroscedasticity. Furthermore, a distributed test for parameter heterogeneity across subpopulations is constructed based on the HADA estimator. The finite-sample performance of the proposed methods is evaluated using simulation studies and the NYC flight data.
What problem does this paper attempt to address?