Federated deep learning enables cancer subtyping by proteomics
Zhaoxiang Cai,Emma L Boys,Zainab Noor,Adel Aref,Dylan Xavier,Natasha Lucas,Steven G Williams,Jennifer M S Koh,Rebecca C Poulos,Yangxiu Wu,Michael Dausmann,Karen L MacKenzie,Adriana Aguilar-Mahecha,Carolina Armengol,Maria M Barranco,Mark Basik,Elise D Bowman,Roderick J Clifton-Bligh,Elizabeth A Connolly,Wendy A Cooper,Bhavik Dalal,Anna DeFazio,Martin Filipits,Peter J Flynn,J Dinny Graham,Jacob George,Anthony J Gill,Michael Gnant,Rosemary Habib,Curtis C Harris,Kate Harvey,Lisa G Horvath,Christopher Jackson,Maija R J Kohonen-Corish,Elgene Lim,Jia Liu,Georgina Long,Reginald V Lord,Graham J Mann,Geoffrey W McCaughan,Lucy Morgan,Leigh C Murphy,Sumanth Nagabushan,Adnan M Nagrial,Jordi Navines,Benedict J Panizza,Jaswinder S Samra,Richard A Scolyer,Ioannis Souglakos,Alexander Swarbrick,David M Thomas,Rosemary L Balleine,Peter G Hains,Phillip J Robinson,Qing Zhong,Roger R Reddel
DOI: https://doi.org/10.1101/2024.10.16.618763
2024-10-19
Abstract:Artificial intelligence applications in biomedicine face major challenges from data privacy requirements. To address this issue for clinically annotated tissue proteomic data, we developed a Federated Deep Learning (FDL) approach (ProCanFDL), training local models on simulated sites containing data from a pan-cancer cohort (n=1,260) and 29 cohorts held behind private firewalls (n=6,265), representing 19,930 replicate data-independent acquisition mass spectrometry (DIA-MS) runs. Local parameter updates were aggregated to build the global model, achieving a 43% performance gain on the hold-out test set (n=625) in 14 cancer subtyping tasks compared to local models, and matching centralized model performance. The approach's generalizability was demonstrated by retraining the global model with data from two external DIA-MS cohorts (n=55) and eight acquired by tandem mass tag (TMT) proteomics (n=832). ProCanFDL presents a solution for internationally collaborative machine learning initiatives using proteomic data, e.g., for discovering predictive biomarkers or treatment targets, while maintaining data privacy.
Cancer Biology