GDC: An Integrated Resource to Explore the Pathogenesis of Hearing Loss through Genetics and Genomics

Hui Cheng,Xuegang Wang,Mingjun Zhong,Jia Geng,Wenjian Li,Kanglu Pei,Yu Lu,Jing Cheng,Fengxiao Bu,Huijun Yuan
DOI: https://doi.org/10.1101/2024.08.19.608726
2024-08-20
Abstract:Effective research and clinical application in audiology and hearing loss (HL) often require the integration of diverse data. However, the absence of a dedicated database impeded understanding and insight extraction in HL. To address this, the Genetic Deafness Commons (GDC) was developed by consolidating extensive genetic and genomic data from 51 public databases and the Chinese Deafness Genetics Consortium, encompassing 5,983,613 variants across 201 HL genes. This comprehensive dataset detailed the genetic landscape of HL, identifying six novel mutational hotspots within DNA binding domains of transcription factor genes, which were eligible for evidence-based variant pathogenicity classification. Comparative phenotypic analyses highlighted considerable disparities between human and mouse models, with only 130 human HL genes exhibiting hearing abnormality in mice. Moreover, gene expression analyses in the cochleae of mice and rhesus macaques demonstrated a notable correlation (R2 = 0.76). Utilizing gene expression, function, pathway, and phenotype data, a SMOTE-Random Forest model identified 18 candidate HL genes, including TBX2 and ERCC2, newly confirmed as HL genes. The GDC, as a comprehensive and unified repository, significantly advances audiology research and clinical practice by enhancing data accessibility and usability, thereby facilitating deeper insights into hearing disorders.
Genetics
What problem does this paper attempt to address?
The paper attempts to address the following major issues: 1. **Integration of Genetic and Genomic Data**: Hearing loss (HL) research and clinical applications often require the integration of multiple data sources. However, the lack of specialized databases hinders the understanding and in-depth analysis of hearing loss. Therefore, researchers developed the Genetic Deafness Commons (GDC), which provides comprehensive genetic and genomic information by integrating data from 51 public databases and the Chinese Deafness Genetic Consortium. 2. **Pathogenicity Classification of Genetic Variants**: By integrating information from multiple databases (such as CDGC, DVD, ClinVar, and HGMD), GDC conducted a detailed pathogenicity classification of genetic variants. The study found that some variants have classification conflicts across different databases, indicating the need for further validation to improve classification accuracy. 3. **Identification of New Candidate Genes**: Using machine learning methods combined with gene expression and functional data, GDC successfully identified 18 new candidate hearing loss genes, including newly confirmed TBX2 and ERCC2 genes. 4. **Analysis of Differences Between Human and Mouse Models**: By comparing hearing loss genes in human and mouse models, significant differences were found. Specifically, only 130 human hearing loss genes showed hearing abnormalities in mouse models, while the remaining genes either did not or lacked relevant data. 5. **Gene Expression Characteristic Analysis**: Through gene expression analysis of cochlear cell types in mice and rhesus monkeys, the study revealed the expression patterns of key genes in different cell types and their roles in hearing development. In summary, by establishing the GDC database, this paper has greatly advanced research and clinical practice in the field of hearing loss, improved data accessibility and usability, and promoted a deeper understanding of hearing impairments.