The National COVID Cohort Collaborative (N3C): Rationale, design, infrastructure, and deployment

Melissa A Haendel,Christopher G Chute,Tellen D Bennett,David A Eichmann,Justin Guinney,Warren A Kibbe,Philip R O Payne,Emily R Pfaff,Peter N Robinson,Joel H Saltz,Heidi Spratt,Christine Suver,John Wilbanks,Adam B Wilcox,Andrew E Williams,Chunlei Wu,Clair Blacketer,Robert L Bradford,James J Cimino,Marshall Clark,Evan W Colmenares,Patricia A Francis,Davera Gabriel,Alexis Graves,Raju Hemadri,Stephanie S Hong,George Hripscak,Dazhi Jiao,Jeffrey G Klann,Kristin Kostka,Adam M Lee,Harold P Lehmann,Lora Lingrey,Robert T Miller,Michele Morris,Shawn N Murphy,Karthik Natarajan,Matvey B Palchuk,Usman Sheikh,Harold Solbrig,Shyam Visweswaran,Anita Walden,Kellie M Walters,Griffin M Weber,Xiaohan Tanner Zhang,Richard L Zhu,Benjamin Amor,Andrew T Girvin,Amin Manna,Nabeel Qureshi,Michael G Kurilla,Sam G Michael,Lili M Portilla,Joni L Rutter,Christopher P Austin,Ken R Gersing,Melissa A Haendel,Christopher G Chute,Tellen D Bennett,David A Eichmann,Justin Guinney,Warren A Kibbe,Philip R O Payne,Emily R Pfaff,Peter N Robinson,Joel H Saltz,Heidi Spratt,Christine Suver,John Wilbanks,Adam B Wilcox,Andrew E Williams,Chunlei Wu,Clair Blacketer,Robert L Bradford,James J Cimino,Marshall Clark,Evan W Colmenares,Patricia A Francis,Davera Gabriel,Alexis Graves,Raju Hemadri,Stephanie S Hong,George Hripscak,Dazhi Jiao,Jeffrey G Klann,Kristin Kostka,Adam M Lee,Harold P Lehmann,Lora Lingrey,Robert T Miller,Michele Morris,Shawn N Murphy,Karthik Natarajan,Matvey B Palchuk,Usman Sheikh,Harold Solbrig,Shyam Visweswaran,Anita Walden,Kellie M Walters,Griffin M Weber,Xiaohan Tanner Zhang,Richard L Zhu,Benjamin Amor,Andrew T Girvin,Amin Manna,Nabeel Qureshi,Michael G Kurilla,Sam G Michael,Lili M Portilla,Joni L Rutter,Christopher P Austin,Ken R Gersing,Shaymaa Al-Shukri,Adil Alaoui,Ahmad Baghal,Pamela D Banning,Edward M Barbour,Michael J Becich,Afshin Beheshti,Gordon R Bernard,Sharmodeep Bhattacharyya,Mark M Bissell,L Ebony Boulware,Samuel Bozzette,Donald E Brown,John B Buse,Brian J Bush,Tiffany J Callahan,Thomas R Campion,Elena Casiraghi,Ammar A Chaudhry,Guanhua Chen,Anjun Chen,Gari D Clifford,Megan P Coffee,Tom Conlin,Connor Cook,Keith A Crandall,Mariam Deacy,Racquel R Dietz,Nicholas J Dobbins,Peter L Elkin,Peter J Embi,Julio C Facelli,Karamarie Fecho,Xue Feng,Randi E Foraker,Tamas S Gal,Linqiang Ge,George Golovko,Ramkiran Gouripeddi,Casey S Greene,Sangeeta Gupta,Ashish Gupta,Janos G Hajagos,David A Hanauer,Jeremy Richard Harper,Nomi L Harris,Paul A Harris,Mehadi R Hassan,Yongqun He,Elaine L Hill,Maureen E Hoatlin,Kristi L Holmes,LaRon Hughes,Randeep S Jawa,Guoqian Jiang,Xia Jing,Marcin P Joachimiak,Steven G Johnson,Rishikesan Kamaleswaran,Thomas George Kannampallil,Andrew S Kanter,Ramakanth Kavuluru,Kamil Khanipov,Hadi Kharrazi,Dongkyu Kim,Boyd M Knosp,Arunkumar Krishnan,Tahsin Kurc,Albert M Lai,Christophe G Lambert,Michael Larionov,Stephen B Lee,Michael D Lesh,Olivier Lichtarge,John Liu,Sijia Liu,Hongfang Liu,Johanna J Loomba,Sandeep K Mallipattu,Chaitanya K Mamillapalli,Christopher E Mason,Jomol P Mathew,James C McClay,Julie A McMurry,Paras P Mehta,Ofer Mendelevitch,Stephane Meystre,Richard A Moffitt,Jason H Moore,Hiroki Morizono,Christopher J Mungall,Monica C Munoz-Torres,Andrew J Neumann,Xia Ning,Jennifer E Nyland,Lisa O'Keefe,Anna O'Malley,Shawn T O'Neil,Jihad S Obeid,Elizabeth L Ogburn,Jimmy Phuong,Jose D Posada,Prateek Prasanna,Fred Prior,Justin Prosser,Amanda Lienau Purnell,Ali Rahnavard,Harish Ramadas,Justin T Reese,Jennifer L Robinson,Daniel L Rubin,Cody D Rutherford,Eugene M Sadhu,Amit Saha,Mary Morrison Saltz,Thomas Schaffter,Titus KL Schleyer,Soko Setoguchi,Nigam H Shah,Noha Sharafeldin,Evan Sholle,Jonathan C Silverstein,Anthony Solomonides,Julian Solway,Jing Su,Vignesh Subbian,Hyo Jung Tak,Bradley W Taylor,Anne E Thessen,Jason A Thomas,Umit Topaloglu,Deepak R Unni,Joshua T Vogelstein,Andréa M Volz,David A Williams,Kelli M Wilson,Clark B Xu,Hua Xu,Yao Yan,Elizabeth Zak,Lanjing Zhang,Chengda Zhang,Jingyi Zheng,
DOI: https://doi.org/10.1093/jamia/ocaa196
2020-08-17
Journal of the American Medical Informatics Association
Abstract:Abstract Objective Coronavirus disease 2019 (COVID-19) poses societal challenges that require expeditious data and knowledge sharing. Though organizational clinical data are abundant, these are largely inaccessible to outside researchers. Statistical, machine learning, and causal analyses are most successful with large-scale data beyond what is available in any given organization. Here, we introduce the National COVID Cohort Collaborative (N3C), an open science community focused on analyzing patient-level data from many centers. Materials and Methods The Clinical and Translational Science Award Program and scientific community created N3C to overcome technical, regulatory, policy, and governance barriers to sharing and harmonizing individual-level clinical data. We developed solutions to extract, aggregate, and harmonize data across organizations and data models, and created a secure data enclave to enable efficient, transparent, and reproducible collaborative analytics. Results Organized in inclusive workstreams, we created legal agreements and governance for organizations and researchers; data extraction scripts to identify and ingest positive, negative, and possible COVID-19 cases; a data quality assurance and harmonization pipeline to create a single harmonized dataset; population of the secure data enclave with data, machine learning, and statistical analytics tools; dissemination mechanisms; and a synthetic data pilot to democratize data access. Conclusions The N3C has demonstrated that a multisite collaborative learning health network can overcome barriers to rapidly build a scalable infrastructure incorporating multiorganizational clinical data for COVID-19 analytics. We expect this effort to save lives by enabling rapid collaboration among clinicians, researchers, and data scientists to identify treatments and specialized care and thereby reduce the immediate and long-term impacts of COVID-19.
information science & library science,computer science, information systems, interdisciplinary applications,health care sciences & services,medical informatics
What problem does this paper attempt to address?