Semantic computing for human phenotypes

Robert Gentleman,Rafael Goncalves,Vincent Carey
DOI: https://doi.org/10.1101/2024.06.21.599369
2024-06-28
Abstract:In many fields, research progress may be hindered by indefiniteness of language used to describe experimental conditions and outcomes. Harmonization of data resources generated by independent groups is important for integrative analysis. Adoption of formal ontologies and vocabularies for experiment annotation should help with harmonization tasks, but the use of ontologies also suffers from a lack of definiteness. In this study we explore how natural language characterization of human diseases coupled with ontologic mapping of study outcome terminology can be used to integrate information from multiple studies of genetic origins of disease risk. Open source tools and workflows are presented. This work exposes areas for improvement in tooling for data harmonization, which is a fundamental requirement for efficient research progress.
Systems Biology
What problem does this paper attempt to address?