Subsampling approach for least squares fitting of semi-parametric accelerated failure time models to massive survival data

Zehan Yang,HaiYing Wang,Jun Yan
DOI: https://doi.org/10.1007/s11222-024-10391-y
IF: 2.3241
2024-02-15
Statistics and Computing
Abstract:Massive survival data are increasingly common in many research fields, and subsampling is a practical strategy for analyzing such data. Although optimal subsampling strategies have been developed for Cox models, little has been done for semiparametric accelerated failure time (AFT) models due to the challenges posed by non-smooth estimating functions for the regression coefficients. We develop optimal subsampling algorithms for fitting semi-parametric AFT models using the least-squares approach. By efficiently estimating the slope matrix of the non-smooth estimating functions using a resampling approach, we construct optimal subsampling probabilities for the observations. For feasible point and interval estimation of the unknown coefficients, we propose a two-step method, drawing multiple subsamples in the second stage to correct for overestimation of the variance in higher censoring scenarios. We validate the performance of our estimators through a simulation study that compares single and multiple subsampling methods and apply the methods to analyze the survival time of lymphoma patients in the Surveillance, Epidemiology, and End Results program.
statistics & probability,computer science, theory & methods
What problem does this paper attempt to address?