Ontology-based Annotation and Retrieval for Large-Scale VCF Data

Jian Liu,Zhi Qu,Yue Li,Jialiang Sun,Yongzhuang Liu
DOI: https://doi.org/10.1109/bibm52615.2021.9669507
2021-01-01
Abstract:Sequencing cost is dramatically reduced by the development of the next-generation sequencing (NGS) technologies. Currently, numerous variant call format (VCF) data and biomedical ontologies, which store mutations data and special biomedical knowledge to applications in the field of biomedical researches such as human genetics, etc., become available in the bioinformatics community. There are some bioinformatics tools developed for the VCF data annotation and analysis. However, most previous works ignore the biomedical ontologies associated with the genetic data, which are usually beneficial to analyze genetic diseases and molecular diagnosis. In particular, annotating information with biomedical ontologies remains an obstacle. In order to effectively integrate biomedical ontologies and enhance the analysis across multiple biomedical sources, we present an automatic workflow called OntoAnnotation for annotating VCF files with biomedical ontologies. Additionally, to facilitate the retrieval of large-scale VCF data for non-bioinformaticians, we develop a web platform called OntoVarSearch, which provides a flexible engine that allows convenient access to genetic variants and ontology-based annotation information stored in the MongoDB database. The OntoAnnotation tool and the OntoVarSearch platform could provide a simple way for users without sufficient programming skills to annotate information with biomedical ontologies and search data stored in VCF files.
What problem does this paper attempt to address?