Distributed Estimation for Large-Scale Cox Regression with Poisson Subsampling

Haixiang Zhang,Yang Li,HaiYing Wang
DOI: https://doi.org/10.48550/arxiv.2310.08208
2023-01-01
Abstract:To ensure privacy protection and alleviate computational burden, we propose a Poisson-subsampling based distributed estimation procedure for the Cox model with massive survival datasets from multi-centered, decentralized sources. The proposed estimator is computed based on optimal subsampling probabilities that we derived and enables transmission of subsample-based summary level statistics between different storage sites with only one round of communication. For inference, the asymptotic properties of the proposed estimator were rigorously established. An extensive simulation study demonstrated that the proposed approach is effective. The methodology was applied to analyze a large dataset from the U.S. airlines.
What problem does this paper attempt to address?