A qualitative review of Oxford Nanopore Sequencing datasets for RNA modifications

Madhurananda Pahar,Qian Liu
DOI: https://doi.org/10.1101/2024.09.26.615132
2024-09-27
Abstract:There are many oxford nanopore datasets available to study methylations. Methylations and modifications occur at nucleotides such as adenine (A), cytosine (C), guanine (G) and theanine (T) or uracil (U). Among all these provided datasets, some have the most common m6A methylation and others have m5C etc. using various real organism reference sequences such as human, mouse and artificial reference sequences which are prepared in the laboratory such as curlcake and IVT. These datasets are required to be organized by the methylation types to research ONT datasets. Here we provide a summary of the read qualities, base mapping success rates etc. for these methylation types and reference genomes. We have used minimap2 base mapping and longreadsum results. We find that methylated data have lower success rates than non-methylated data and mapping quality is lower for the real reference genomes such as human and mice. This could be because they contain more than 100,000 transcriptomes whereas artificial reference sequences contain only a few transcriptomes. Datasets which contain artificially created reference sequences have a higher quality than the others, thus they are recommended to be used for methylation or modification classification tasks in the future.
Bioinformatics
What problem does this paper attempt to address?