Using synthetic RNA to benchmark poly(A) length inference from direct RNA sequencing.

Jessie J-Y Chang,Xuan Yang,Haotian Teng,Benjamin Reames,Vincent Corbin,Lachlan J M Coin
DOI: https://doi.org/10.1101/2024.10.25.620206
2024-10-25
Abstract:Polyadenylation is a dynamic process which is important in cellular physiology. Oxford Nanopore Technologies direct RNA-sequencing provides a strategy for sequencing the full-length RNA molecule and analysis of the transcriptome and epi-transcriptome. There are currently several tools available for poly(A) tail-length estimation, including well-established tools such as tailfindr and nanopolish, as well as two more recent deep learning models: Dorado and BoostNano. However, there has been limited benchmarking of the accuracy of these tools against gold-standard datasets. In this paper we evaluate four poly(A) estimation tools using synthetic RNA standards (Sequins), which have known poly(A) tail-lengths and provide a valuable approach to measuring the accuracy of poly(A) tail-length estimation. All four tools generate mean tail-length estimates which lie within 12% of the correct value. Overall, Dorado is recommended as the preferred approach due to its relatively fast run times, low coefficient of variation and ease of use with integration with base-calling.
Bioinformatics
What problem does this paper attempt to address?