dsMTL - a computational framework for privacy-preserving, distributed multi-task machine learning

Han Cao,Youcheng Zhang,Jan Baumbach,Paul R Burton,Dominic Dwyer,Nikolaos Koutsouleris,Julian Matschinske,Yannick Marcon,Sivanesan Rajan,Thilo Rieg,Patricia Ryser-Welch,Julian Späth,Carl Herrmann,Emanuel Schwarz,
DOI: https://doi.org/10.1101/2021.08.26.457778
2021-08-28
Abstract:Abstract Multitask learning allows the simultaneous learning of multiple ‘communicating’ algorithms. It is increasingly adopted for biomedical applications, such as the modeling of disease progression. As data protection regulations limit data sharing for such analyses, an implementation of multitask learning on geographically distributed data sources would be highly desirable. Here, we describe the development of dsMTL, a computational framework for privacy-preserving, distributed multi-task machine learning that includes three supervised and one unsupervised algorithms. dsMTL is implemented as a library for the R programming language and builds on the DataSHIELD platform that supports the federated analysis of sensitive individual-level data. We provide a comparative evaluation of dsMTL for the identification of biological signatures in distributed datasets using two case studies, and evaluate the computational performance of the supervised and unsupervised algorithms. dsMTL provides an easy- to-use framework for privacy-preserving, federated analysis of geographically distributed datasets, and has several application areas, including comorbidity modeling and translational research focused on the simultaneous prediction of different outcomes across datasets. dsMTL is available at https://github.com/transbioZI/dsMTLBase (server-side package) and https://github.com/transbioZI/dsMTLClient (client-side package).
What problem does this paper attempt to address?