A One-Shot Lossless Algorithm for Cross-Cohort Learning in Mixed-Outcomes Analysis

Ruowang Li,Luke Benz,Rui Duan,Joshua Denny,Hakon Hakonarson,Jonathan Mosley,Jordan W Smoller,Wei-Qi Wei,Marylyn D Ritchie,Jason H Moore,Yong Chen
DOI: https://doi.org/10.1101/2024.01.09.24301073
2024-12-04
Abstract:In cross-cohort studies, integrating diverse datasets, such as electronic health records (EHRs), is both essential and challenging due to cohort-specific variations, distributed data storage, and data privacy concerns. Traditional methods often require data pooling or complex data harmonization, which can reduce efficiency and limit the scope of cross-cohort learning. We introduce mixWAS, a one-shot, lossless algorithm that efficiently integrates distributed EHR datasets via summary statistics. Unlike existing approaches, mixWAS preserves cohort-specific covariate associations and supports simultaneous mixed-outcome analyses. Simulations demonstrate that mixWAS outperforms conventional methods in accuracy and efficiency across various scenarios. Applied to EHR data from seven cohorts in the US, mixWAS identified 4,534 significant cross-cohort genetic associations among traits such as blood lipids, BMI, and circulatory diseases. Validation with an independent UK EHR dataset confirmed 97.7% of these associations, underscoring the algorithm's robustness. By enabling lossless cross-cohort integration, mixWAS improves the precision of multi-outcome analyses and expands the potential for actionable insights in healthcare research.
What problem does this paper attempt to address?