NanoDeep: a deep learning framework for nanopore adaptive sampling on microbial sequencing

Yusen Lin,Yongjun Zhang,Hang Sun,Hang Jiang,Xing Zhao,Xiaojuan Teng,Jingxia Lin,Bowen Shu,Hao Sun,Yuhui Liao,Jiajian Zhou
DOI: https://doi.org/10.1093/bib/bbad499
IF: 9.5
2024-01-09
Briefings in Bioinformatics
Abstract:Nanopore sequencers can enrich or deplete the targeted DNA molecules in a library by reversing the voltage across individual nanopores. However, it requires substantial computational resources to achieve rapid operations in parallel at read-time sequencing. We present a deep learning framework, NanoDeep, to overcome these limitations by incorporating convolutional neural network and squeeze and excitation. We first showed that the raw squiggle derived from native DNA sequences determines the origin of microbial and human genomes. Then, we demonstrated that NanoDeep successfully classified bacterial reads from the pooled library with human sequence and showed enrichment for bacterial sequence compared with routine nanopore sequencing setting. Further, we showed that NanoDeep improves the sequencing efficiency and preserves the fidelity of bacterial genomes in the mock sample. In addition, NanoDeep performs well in the enrichment of metagenome sequences of gut samples, showing its potential applications in the enrichment of unknown microbiota. Our toolkit is available at https://github.com/lysovosyl/NanoDeep.
biochemical research methods,mathematical & computational biology
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address several key issues in nanopore sequencing technology for microbial sequencing: 1. **Improving Sequencing Efficiency**: Although nanopore sequencers (such as Oxford Nanopore Technologies' MinION) have advantages like portability, real-time analysis, and high resolution, their sequencing efficiency is relatively low when microbial DNA content in clinical samples is extremely low. The paper proposes a deep learning-based method called NanoDeep, which is used to enrich microbial sequences in real-time during the sequencing process. 2. **Reducing Sequencing of Non-target Sequences**: In clinical samples, microbial DNA is usually much less abundant than host genomic DNA. Traditional methods (such as PCR amplification or hybrid capture) are time-consuming and require specialized equipment. NanoDeep analyzes raw signals (squiggles) to identify and filter out target microbial sequences in real-time during sequencing, thereby reducing the sequencing of non-target sequences. 3. **Increasing Sequencing Speed and Accuracy**: Existing alignment-based methods (such as Readfish, UNCALLED, etc.) require substantial computational resources and rely on genome index databases. NanoDeep employs convolutional neural networks (CNN) and squeeze-and-excitation (SE) modules, enabling rapid classification without sacrificing accuracy. Through these improvements, NanoDeep not only enhances sequencing efficiency but also maintains the fidelity of microbial genomes, demonstrating its potential application value in metagenomic sequencing.