CanDIG: Secure Federated Genomic Queries and Analyses Across Jurisdictions

L. Jonathan Dursi,Zoltan Bozoky,Richard de Borja,Jimmy Li,David Bujold,Adam Lipski,Shaikh Farhan Rashid,Amanjeev Sethi,Neelam Memon,Dashaylan Naidoo,Felipe Coral-Sasso,Matthew Wong,P-O Quirion,Zhibin Lu,Samarth Agarwal,Kat Pavlov,Andrew Ponomarev,Mia Husic,Krista Pace,Samantha L. Palmer,Stephanie A. Grover,Sevan Hakgor,Lillian L. Siu,David Malkin,Carl Virtanen,Trevor J. Pugh,Pierre-Étienne Jacques,Yann Joly,Steven J. M. Jones,Guillaume Bourque,Michael Brudno
DOI: https://doi.org/10.1101/2021.03.30.434101
2021-03-31
Abstract:Abstract Rapid expansions of bioinformatics and computational biology have broadened the collection and use of -omics data including genomic, transcriptomic, methylomic and a myriad of other health data types, in the clinic and the laboratory. Both clinical and research uses of such data require co-analysis with large datasets, for which participant privacy and the need for data custodian controls must remain paramount. This is particularly challenging in multi-jurisdictional settings, such as Canada, where health privacy and security requirements are often heterogeneous. Data federation presents a solution to this, allowing for integration and analysis of large datasets from various sites while abiding by local policies. The Canadian Distributed Infrastructure for Genomics platform (CanDIG) enables federated querying and analysis of -omics and health data while keeping that data local and under local control. It builds upon existing infrastructures to connect five health and research institutions across Canada, relies heavily on standards and tooling brought together by the Global Alliance for Genomics and Health (GA4GH), implements a clear division of responsibilities among its participants and adheres to international data sharing standards. Participating researchers and clinicians can therefore contribute to and quickly access a critical mass of -omics data across a national network in a manner that takes into account the multi-jurisdictional nature of our privacy and security policies. Through this, CanDIG gives medical and research communities the tools needed to use and analyze the ever-growing amount of -omics data available to them in order to improve our understanding and treatment of various conditions and diseases. CanDIG is being used to make genomic and phenotypic data available for querying across Canada as part of data sharing for five leading pan-Canadian projects including the Terry Fox Comprehensive Cancer Care Centre Consortium Network (TF4CN) and Terry Fox PRecision Oncology For Young peopLE (PROFYLE), and making data from provincial projects such as POG (Personalized Onco- Genomics) more widely available.
What problem does this paper attempt to address?