Abstract:In cancer genomics, precise variant annotation is crucial for clinical decisions, drug development, and research. The burgeoning genomic data offers an opportunity to use data-driven approaches to generate knowledge that supports clinical decisions. Particularly machine learning (ML) and Deep learning (DL), are becoming essential, as their application is fast, scalable, simple to implement, and generates reproducible results. The methods ranging from simple sequence-based alignment scoring to advanced algorithms like Logistic regression, Support vector machine, and Recurrent neural networks, have been employed by multiple variant annotation tools. This study compares in-silico methods available in Ensembl's Variant Effect Predictor (VEP) using a test dataset with COSMIC annotations. ML/DL success relies on robust training sets with comprehensive genomic variants features, including effects on transcription/translation, genomic context, annotation resources, in silico pathogenicity predictions, and population allele frequency. The training set, composed of known benign or pathogenic variants, serves as a reference for these algorithms to classify new and unseen variants. Our analysis reveals a limited concordance between the prediction algorithms. Despite comparable true/false positives/negatives, discrepancies persist in variant classification. Certain algorithms exhibit a propensity to over or under-call deleterious mutations. Some demonstrate a tendency to classify random variants in non-cancer genes as deleterious. Challenges include the absence of consensus on informative features, diverse training datasets, and restriction to well annotated proteins/transcripts. Balancing the sensitivity and false positives in detecting cancer drivers is crucial. Integrating individual prediction scores with ML algorithms enhances tool performance but comes with risk of error propagation, and limited accuracy. The study emphasises the need for context-specific variant classification tools, as many variants' impacts are cancer-type specific, and some may drive disease synergistically. Existing tools, designed for a "one variant - one score approach," struggle to capture complex associations, especially those dependent on changes in the tumour microenvironment. Highlighting areas for improvement, the study addresses the "black box" problem in decision processes. While limited interpretability might not hinder practical applications, tools should evolve to assess more complex associations guided by biology. Formal consensus, reference training datasets, and standards are deemed essential for developing next-generation tools. The envisioned context-dependent tools aim to streamline feature complexity, thereby mitigating the black box problem and advancing the accuracy and interpretability of cancer variant annotation. Citation Format: Madhumita, Zbyslaw Sondka, Jon Teague. Evaluating the utility of in silico variant annotation tools for cancer driver detection [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular s); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl) nr 4884.

Model performance and interpretability of semi-supervised generative adversarial networks to predict oncogenic variants with unlabeled data

LUADpp: an Effective Prediction Model on Prognosis of Lung Adenocarcinomas Based on Somatic Mutational Features

A semi-supervised deep learning approach for predicting the functional effects of genomic non-coding variations

A Semi-Supervised Generative Adversarial Network for Prediction of Genetic Disease Outcomes

In silico generation of synthetic cancer genomes using generative AI

Improving an Electronic Health Record–Based Clinical Prediction Model Under Label Deficiency: Network-Based Generative Adversarial Semisupervised Approach

Semi-supervised Rare Disease Detection Using Generative Adversarial Network

Evaluating Generative AI's Ability to Identify Cancer Subtypes in Publicly Available Structured Genetic Datasets

Predictive Modeling of Novel Somatic Mutation Impacts on Cancer Prognosis: A Machine Learning Approach Using the COSMIC Database

Using Machine Learning and Natural Language Processing to Review and Classify the Medical Literature on Cancer Susceptibility Genes

Machine learning predictions improve identification of real-world cancer driver mutations

Optimize Deep Learning Models for Prediction of Gene Mutations Using Unsupervised Clustering

Computational Approaches for Disease Gene Identification

Self-supervised attention-based deep learning for pan-cancer mutation prediction from histopathology

Real-world evaluation of deep learning algorithms to classify functional pathogenic germline variants

Using the "Hidden" Genome to Improve Classification of Cancer Types

SGANRDA: semi-supervised generative adversarial networks for predicting circRNA–disease associations

Deep generative AI models analyzing circulating orphan non-coding RNAs enable accurate detection of early-stage non-small cell lung cancer

A novel transformer-based aggregation model for predicting gene mutations in lung adenocarcinoma

A Probabilistic Model to Predict Clinical Phenotypic Traits from Genome Sequencing

Abstract 4884: Evaluating the utility of in silico variant annotation tools for cancer driver detection