Biolink Model: A universal schema for knowledge graphs in clinical, biomedical, and translational science
Deepak R. Unni,Sierra A. T. Moxon,Michael Bada,Matthew Brush,Richard Bruskiewich,J. Harry Caufield,Paul A. Clemons,Vlado Dancik,Michel Dumontier,Karamarie Fecho,Gustavo Glusman,Jennifer J. Hadlock,Nomi L. Harris,Arpita Joshi,Tim Putman,Guangrong Qin,Stephen A. Ramsey,Kent A. Shefchek,Harold Solbrig,Karthik Soman,Anne E. Thessen,Melissa A. Haendel,Chris Bizon,Christopher J. Mungall,The Biomedical Data Translator Consortium,Liliana Acevedo,Stanley C. Ahalt,John Alden,Ahmed Alkanaq,Nada Amin,Ricardo Avila,Jim Balhoff,Sergio E. Baranzini,Andrew Baumgartner,William Baumgartner,Basazin Belhu,MacKenzie Brandes,Namdi Brandon,Noel Burtt,William Byrd,Jackson Callaghan,Marco Alvarado Cano,Steven Carrell,Remzi Celebi,James Champion,Zhehuan Chen,Mei‐Jan Chen,Lawrence Chung,Kevin Cohen,Tom Conlin,Dan Corkill,Maria Costanzo,Steven Cox,Andrew Crouse,Camerron Crowder,Mary E. Crumbley,Cheng Dai,Vlado Dančík,Ricardo De Miranda Azevedo,Eric Deutsch,Jennifer Dougherty,Marc P. Duby,Venkata Duvvuri,Stephen Edwards,Vincent Emonet,Nathaniel Fehrmann,Jason Flannick,Aleksandra M. Foksinska,Vicki Gardner,Edgar Gatica,Amy Glen,Prateek Goel,Joseph Gormley,Alon Greyber,Perry Haaland,Kristina Hanspers,Kaiwen He,Kaiwen He,Jeff Henrickson,Eugene W. Hinderer,Maureen Hoatlin,Andrew Hoffman,Sui Huang,Conrad Huang,Robert Hubal,Kenneth Huellas‐Bruskiewicz,Forest B. Huls,Lawrence Hunter,Greg Hyde,Tursynay Issabekova,Matthew Jarrell,Lindsay Jenkins,Adam Johs,Jimin Kang,Richa Kanwar,Yaphet Kebede,Keum Joo Kim,Alexandria Kluge,Michael Knowles,Ryan Koesterer,Daniel Korn,David Koslicki,Ashok Krishnamurthy,Lindsey Kvarfordt,Jay Lee,Margaret Leigh,Jason Lin,Zheng Liu,Shaopeng Liu,Chunyu Ma,Andrew Magis,Tarun Mamidi,Meisha Mandal,Michelle Mantilla,Jeffrey Massung,Denise Mauldin,Jason McClelland,Julie McMurry,Philip Mease,Luis Mendoza,Marian Mersmann,Abrar Mesbah,Matthew Might,Kenny Morton,Sandrine Muller,Arun Teja Muluka,John Osborne,Phil Owen,Michael Patton,David B. Peden,R. Carter Peene,Bria Persaud,Emily Pfaff,Alexander Pico,Elizabeth Pollard,Guthrie Price,Shruti Raj,Jason Reilly,Anders Riutta,Jared Roach,Ryan T. Roper,Greg Rosenblatt,Irit Rubin,Sienna Rucka,Nathaniel Rudavsky‐Brody,Rayn Sakaguchi,Eugene Santos,Kevin Schaper,Charles P. Schmitt,Shepherd Schurman,Erik Scott,Sarah Seitanakis,Priya Sharma,Ilya Shmulevich,Manil Shrestha,Shalki Shrivastava,Meghamala Sinha,Brett Smith,Noel Southall,Nicholas Southern,Lisa Stillwell,Michael Strasser,Andrew I. Su,Casey Ta,Anne E. Thessen,Jillian Tinglin,Lucas Tonstad,Thi Tran‐Nguyen,Alexander Tropsha,Gaurav Vaidya,Luke Veenhuis,Adam Viola,Marcin von Grotthuss,Max Wang,Patrick Wang,Paul B. Watkins,Rosina Weber,Qi Wei,Chunhua Weng,Jordan Whitlock,Mark D. Williams,Andrew Williams,Finn Womack,Erica Wood,Chunlei Wu,Jiwen Kevin Xin,Hao Xu,Colleen Xu,Chase Yakaboski,Yao Yao,Hong Yi,Arif Yilmaz,Marissa Zheng,Xinghua Zhou,Eric Zhou,Qian Zhu,Tom Zisk,Michael " Michi" Strasser,Marcin Grotthuss
DOI: https://doi.org/10.1111/cts.13302
2022-06-08
Clinical and Translational Science
Abstract:Within clinical, biomedical, and translational science, an increasing number of projects are adopting graphs for knowledge representation. Graph‐based data models elucidate the interconnectedness among core biomedical concepts, enable data structures to be easily updated, and support intuitive queries, visualizations, and inference algorithms. However, knowledge discovery across these "knowledge graphs" (KGs) has remained difficult. Data set heterogeneity and complexity; the proliferation of ad hoc data formats; poor compliance with guidelines on findability, accessibility, interoperability, and reusability; and, in particular, the lack of a universally accepted, open‐access model for standardization across biomedical KGs has left the task of reconciling data sources to downstream consumers. Biolink Model is an open‐source data model that can be used to formalize the relationships between data structures in translational science. It incorporates object‐oriented classification and graph‐oriented features. The core of the model is a set of hierarchical, interconnected classes (or categories) and relationships between them (or predicates) representing biomedical entities such as gene, disease, chemical, anatomic structure, and phenotype. The model provides class and edge attributes and associations that guide how entities should relate to one another. Here, we highlight the need for a standardized data model for KGs, describe Biolink Model, and compare it with other models. We demonstrate the utility of Biolink Model in various initiatives, including the Biomedical Data Translator Consortium and the Monarch Initiative, and show how it has supported easier integration and interoperability of biomedical KGs, bringing together knowledge from multiple sources and helping to realize the goals of translational science.
medicine, research & experimental