Leveraging Basecaller's Move Table to Generate a Lightweight k-mer Model
Hiruna Samarakoon,Yuk Kei Wan,Sri Parameswaran,Jonathan Göke,Hasindu Gamaarachchi,Ira W Deveson
DOI: https://doi.org/10.1101/2024.06.30.601452
2024-07-01
Abstract:Nanopore sequencing by Oxford Nanopore Technologies (ONT) enables direct analysis of DNA and RNA by capturing raw electrical signals. Different nanopore chemistries have varied k-mer lengths, current levels, and standard deviations, which are stored in k-mer models. Particularly in cases where official models are lacking or unsuitable for specific sequencing conditions, tailored k-mer models are crucial to ensure precise signal-to-sequence alignment and interpretation. The process of transforming raw signals into nucleotide sequences, known as basecalling, is a fundamental step in nanopore sequencing. In this study, we leverage the basecaller's move table to create a lightweight denovo k-mer model for RNA004 chemistry. We showcase the effectiveness of our custom k-mer model through high alignment rates (97.48%) compared to larger default models. Additionally, our 5-mer model exhibits similar performance as the default 9-mer models in m6A methylation detection.
Bioinformatics
What problem does this paper attempt to address?
This paper mainly discusses how to use the move table in Oxford Nanopore Technologies (ONT) basecaller for nanopore sequencing to generate a lightweight k-mer model, especially for the RNA004 chemistry reaction. In nanopore sequencing, the k-mer model is crucial for accurately aligning the raw electrical signals to nucleotide sequences. Customized k-mer models become particularly important when the official models are missing or not suitable for specific sequencing conditions.
In the research method, the authors developed a program called Poregen, which extracts information from the move table of the basecaller to create a new k-mer model. They ensured the quality of the model through filtering and sampling techniques and compared the performance of the customized 5-mer model and the default 9-mer model in base calling and methylation detection (such as m6A).
The results showed that the customized 5-mer model exhibited a high alignment rate (97.48%) comparable to the larger default model in the RNA004 chemistry reaction, and performed similarly to the 9-mer model in m6A methylation detection. This indicates that the lightweight k-mer model can effectively save computational resources while maintaining high accuracy in data analysis.
The paper also emphasizes the importance of k-mer models in nanopore sequencing data analysis and proposes methods for constructing and optimizing customized k-mer models in the absence of official models or the need for optimization in specific sequencing environments. These findings are significant for improving the efficiency and accuracy of nanopore sequencing data analysis.