Toward Automatic Variant Interpretation: Discordant Genetic Interpretation Across Variant Annotations for ClinVar Pathogenic Variants

Ann, Yu-An Chen,Tzu-Hang Yuan,Jia-Hsing Huang,Yu-Bin Wang,Tzu-Mao Hung,Chien-Yu Chen,Jacob Shujui Hsu,Pei-Lung Chen
DOI: https://doi.org/10.1101/2024.10.11.617756
2024-10-15
Abstract:Purpose: High-throughput sequencing has revolutionized genetic disorder diagnosis, but variant pathogenicity interpretation is still challenging. Even though the Human Genome Variation Society (HGVS) provides recommendations for variant nomenclature, discrepancies in annotation remain a significant hurdle. Methods: This study evaluated the annotation concordance between three tools-ANNOVAR, SnpEff, and Variant Effect Predictor (VEP)-using 164,549 two-star variants from ClinVar. The analysis used HGVS nomenclature string-match comparisons to assess annotation consistency from each tool, corresponding coding impacts, and associated ACMG criteria inferred from the annotations. Results: The analysis revealed variable concordance rates, with 58.52% agreement for HGVSc, 84.04% for HGVSp, and 85.58% for the coding impact. SnpEff showed the highest match for HGVSc (0.988), while VEP bettered for HGVSp (0.977). The substantial discrepancies were noted in the Loss-of-Function (LoF) category. Incorrect PVS1 interpretations affected the final pathogenicity and downgraded PLP variants (ANNOVAR 55.9%, SnpEff 66.5%, VEP 67.3%), risking false negatives of clinically relevant variants in reports. Conclusions: These findings highlight the critical challenges in accurately interpreting variant pathogenicity due to discrepancies in annotations. To enhance the reliability of genetic variant interpretation in clinical practice, standardizing transcript sets and systematically cross-validating results across multiple annotation tools is essential.
Genomics
What problem does this paper attempt to address?
### Problems the paper attempts to solve The paper aims to solve the key problems in gene variant interpretation. Especially in the context where high - throughput sequencing technology is widely used in the diagnosis of genetic diseases, how to accurately interpret the pathogenicity of variants remains a major challenge. Although the Human Genome Variation Society (HGVS) has provided standards and guidelines for variant nomenclature, the annotation differences among different tools are still significant, which poses an obstacle to the reliability of variant interpretation in clinical practice. Specifically, the study mainly focuses on the following aspects: 1. **Annotation consistency problem**: The inconsistency in variant annotation among different annotation tools (such as ANNOVAR, SnpEff and Variant Effect Predictor (VEP)), especially the differences in HGVS nomenclature and coding impact. 2. **Interpretation problem of loss - of - function (LoF) variants**: The misinterpretation of loss - of - function variants by some tools (for example, the improper application of the PVS1 standard) may lead to false - negative reports of clinically important variants, thus affecting the final pathogenicity assessment. 3. **Application problem of ACMG guidelines**: Due to the differences among annotation tools, when performing automated classification of ACMG guidelines based on the annotation results of these tools, inconsistent situations may occur, which in turn affects the clinical interpretation of variants. ### Research purposes In order to improve the accuracy of gene variant interpretation, especially to ensure the reliability and consistency of variant interpretation in clinical practice, this study evaluated the annotation consistency of three commonly used annotation tools (ANNOVAR, SnpEff and VEP) on 164,549 high - review - status variants in the ClinVar database, revealed the differences among different tools, and proposed improvement measures. ### Key findings - **Low annotation consistency**: The annotation consistency among different tools in HGVS nomenclature (58.52% for HGVSc and 84.04% for HGVSp) and coding impact (85.58%) is low, especially for the loss - of - function (LoF) category. - **Differences in tool performance**: SnpEff performs best in HGVSc annotation (0.988), while VEP performs best in HGVSp annotation (0.977). ANNOVAR performs poorly in multiple aspects. - **Impact on ACMG classification**: Incorrect PVS1 interpretation leads to the downgrading of PLP variants, increasing the risk of clinically important variants being misjudged as false - negative. ### Conclusions and recommendations This study emphasizes the importance of standardizing the transcript set and systematically validating results across multiple annotation tools to improve the reliability and consistency of gene variant interpretation in clinical practice. In addition, selecting the most relevant transcripts and ensuring the accuracy of HGVS syntax are also crucial steps to ensure accurate variant interpretation.