Cross-validation Approaches for Multi-study Predictions

Boyu Ren,Prasad Patil,Francesca Dominici,Giovanni Parmigiani,Lorenzo Trippa
2024-07-22
Abstract:We consider prediction in multiple studies with potential differences in the relationships between predictors and outcomes. Our objective is to integrate data from multiple studies to develop prediction models for unseen studies. We propose and investigate two cross-validation approaches applicable to multi-study stacking, an ensemble method that linearly combines study-specific ensemble members to produce generalizable predictions. Among our cross-validation approaches are some that avoid reuse of the same data in both the training and stacking steps, as done in earlier multi-study stacking. We prove that under mild regularity conditions the proposed cross-validation approaches produce stacked prediction functions with oracle properties. We also identify analytically in which scenarios the proposed cross-validation approaches increase prediction accuracy compared to stacking with data reuse. We perform a simulation study to illustrate these results. Finally, we apply our method to predicting mortality from long-term exposure to air pollutants, using collections of datasets.
Methodology,Statistics Theory
What problem does this paper attempt to address?