Harnessing generative AI to annotate the severity of all phenotypic abnormalities within the Human Phenotype Ontology

Kitty Murphy,Brian M Schilder,Nathan G Skene
DOI: https://doi.org/10.1101/2024.06.10.24308475
2024-06-11
Abstract:There are thousands of human phenotypes which are linked to genetic variation. These range from the benign (white eyelashes) to the deadly (respiratory failure). The Human Phenotype Ontology has categorised all human phenotypic variation into a unified framework that defines the relationships between them (e.g. missing arms and missing legs are both abnormalities of the limb). This has made it possible to perform phenome-wide analyses, e.g. to prioritise which make the best candidates for gene therapies. However, there is currently limited metadata describing the clinical characteristics / severity of these phenotypes. With >17500 phenotypic abnormalities across >8600 rare diseases, manual curation of such phenotypic annotations by experts would be exceedingly labour-intensive and time-consuming. Leveraging advances in artificial intelligence, we employed the OpenAI GPT-4 large language model (LLM) to systematically annotate the severity of all phenotypic abnormalities in the HPO. Phenotypic severity was defined using a set of clinical characteristics and their frequency of occurrence. First, we benchmarked the generative LLM clinical characteristic annotations against ground-truth labels within the HPO (e.g. phenotypes in the "Cancer" HPO branch were annotated as causing cancer by GPT-4). True positive recall rates across different clinical characteristics ranged from 89-100% (mean=96%), clearly demonstrating the ability of GPT-4 to automate the curation process with a high degree of fidelity. Using a novel approach, we developed a severity scoring system that incorporates both the nature of the clinical characteristic and the frequency of its occurrence. These clinical characteristic severity metrics will enable efforts to systematically prioritise which human phenotypes are most detrimental to human health, and best targets for therapeutic intervention.
Genetic and Genomic Medicine
What problem does this paper attempt to address?
The problem this paper attempts to address is the systematic annotation of the severity of all phenotypic abnormalities in the Human Phenotype Ontology (HPO). Specifically, the research team utilizes OpenAI's GPT-4 large language model to automate this process. Since HPO contains over 17,500 phenotypic abnormalities and involves more than 8,600 rare diseases, manually annotating the clinical features and severity of these phenotypes is both time-consuming and labor-intensive. Therefore, the researchers aim to improve annotation efficiency through AI technology while maintaining high accuracy. By applying GPT-4 to term annotation in HPO, researchers can not only quickly assess the potential impact of various phenotypes but also develop a severity scoring system based on the frequency of clinical features. This helps prioritize which phenotypes have the greatest impact on human health and identify which phenotypes are most suitable for therapeutic intervention. Additionally, this framework can assist clinicians in quickly diagnosing or prioritizing certain phenotypes and provide researchers with information on the potential impact and research needs of target diseases.