Abstract:Formalin-fixed, paraffin-embedded (FFPE) tissue specimens are routinely used in pathological diagnosis, but their large number of artifactual mutations complicate the evaluation of companion diagnostics and analysis of next-generation sequencing data. Identification of variants with low allele frequencies is challenging because existing FFPE filtering tools label all low-frequency variants as artifacts. To address this problem, we aimed to develop DEEPOMICS FFPE, an AI model that can classify a true variant from an artifact. Paired whole exome sequencing data from fresh frozen and FFPE samples from 24 tumors were obtained from public sources and used as training and validation sets at a ratio of 7:3. A deep neural network model with three hidden layers was trained with input features using outputs of the MuTect2 caller. Contributing features were identified using the SHapley Additive exPlanations algorithm and optimized based on training results. The performance of the final model (DEEPOMICS FFPE) was compared with those of existing models (MuTect filter, FFPolish, and SOBDetector) by using well-defined test datasets. We found 41 discriminating properties for FFPE artifacts. Optimization of property quantification improved the model performance. DEEPOMICS FFPE removed 99.6% of artifacts while maintaining 87.1% of true variants, with an F1-score of 88.3 in the entire dataset not used for training, which is significantly higher than those of existing tools. Its performance was maintained even for low-allele-fraction variants with a specificity of 0.995, suggesting that it can be used to identify subclonal variants. Different from existing methods, DEEPOMICS FFPE identified most of the sequencing artifacts in the FFPE samples while retaining more of true variants, including those of low allele frequencies. The newly developed tool DEEPOMICS FFPE may be useful in designing capture panels for personalized circulating tumor DNA assay and identifying candidate neoepitopes for personalized vaccine design. DEEPOMICS FFPE is freely available on the web (http://deepomics.co.kr/ffpe) for research.

Deqformer: high-definition and scalable deep learning probe design method

ProbeDealer is a convenient tool for designing probes for highly multiplexed fluorescence in situ hybridization

Designing a Hybrid Chain Reaction Probe for Multiplex Transcripts Assay with High-Level Imaging

: a capture probe design toolkit for genetic diversity reconstructions from ancient environmental DNA

Degps is a Powerful Tool for Detecting Differential Expression in RNA-sequencing Studies

Pathway-enhanced Transformer-based robust model for quantifying cell types of origin of cell-free transcriptome

Expanding detection windows for discriminating single nucleotide variants using rationally designed DNA equalizer probes.

Deep flanking sequence engineering for efficient promoter design using DeepSEED

Dean Flow Assisted Single Cell and Bead Encapsulation for High Performance Single Cell Expression Profiling.

NanoDeep: a deep learning framework for nanopore adaptive sampling on microbial sequencing

Benchmarking and integration of methods for deconvoluting spatial transcriptomic data

Genome-scale Proteome Quantification by DEEP SEQ Mass Spectrometry

DeFine: Deep Convolutional Neural Networks Accurately Quantify Intensities of Transcription Factor-Dna Binding and Facilitate Evaluation of Functional Non-Coding Variants

Deep learning enables the use of ultra-high-density array in DNBSEQ

FitDevo: accurate inference of single-cell developmental potential using sample-specific gene weight

spaDesign: A Statistical Framework to Improve the Design of Sequencing-based Spatial Transcriptomics Experiments

A Novel Computational Complete Deconvolution Method Using RNA-seq Data

Chrom-pro: A User-Friendly Toolkit for De-novo Chromosome Assembly and Genomic Analysis

DEEPOMICS FFPE, a deep neural network model, identifies DNA sequencing artifacts from formalin fixed paraffin embedded tissue with high accuracy

DePS: An improved deep learning model for de novo peptide sequencing

A deep boosting based approach for capturing the sequence binding preferences of RNA-binding proteins from high-throughput CLIP-seq data