A Genomic Language Model for Chimera Artifact Detection in Nanopore Direct RNA Sequencing

Yangyang Li,Ting-You Wang,Qingxiang Guo,Yanan Ren,Xiaotong Lu,Qi Cao,Rendong Yang
DOI: https://doi.org/10.1101/2024.10.23.619929
2024-10-26
Abstract:Chimera artifacts in nanopore direct RNA sequencing (dRNA-seq) can significantly distort transcriptome analyses, yet their detection and removal remain challenging due to limitations in existing basecalling models. We present DeepChopper, a genomic language model that precisely identifies and removes adapter sequences from base-called dRNA-seq long reads at single-base resolution, operating independently of raw signal or alignment information to effectively eliminate chimeric read artifacts. By removing these artifacts, DeepChopper substantially improves the accuracy of critical downstream analyses, such as transcript annotation and gene fusion detection, thereby enhancing the reliability and utility of nanopore dRNA-seq for transcriptomics research
Genomics
What problem does this paper attempt to address?