GPAD: a natural language processing-based application to extract the gene-disease association discovery information from OMIM

K. M. Tahsin Hassan Rahit,Vladimir Avramovic,Jessica X. Chong,Maja Tarailo-Graovac
DOI: https://doi.org/10.1186/s12859-024-05693-x
IF: 3.307
2024-02-29
BMC Bioinformatics
Abstract:Thousands of genes have been associated with different Mendelian conditions. One of the valuable sources to track these gene-disease associations (GDAs) is the Online Mendelian Inheritance in Man (OMIM) database. However, most of the information in OMIM is textual, and heterogeneous (e.g. summarized by different experts), which complicates automated reading and understanding of the data. Here, we used Natural Language Processing (NLP) to make a tool (Gene-Phenotype Association Discovery (GPAD)) that could syntactically process OMIM text and extract the data of interest.
biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology
What problem does this paper attempt to address?