Systematic assessment of long-read RNA-seq methods for transcript identification and quantification

Francisco J. Pardo-Palacios,Dingjie Wang,Fairlie Reese,Mark Diekhans,Sílvia Carbonell-Sala,Brian Williams,Jane E. Loveland,Maite De María,Matthew S. Adams,Gabriela Balderrama-Gutierrez,Amit K. Behera,Jose M. Gonzalez Martinez,Toby Hunt,Julien Lagarde,Cindy E. Liang,Haoran Li,Marcus Jerryd Meade,David A. Moraga Amador,Andrey D. Prjibelski,Inanc Birol,Hamed Bostan,Ashley M. Brooks,Muhammed Hasan Çelik,Ying Chen,Mei R. M. Du,Colette Felton,Jonathan Göke,Saber Hafezqorani,Ralf Herwig,Hideya Kawaji,Joseph Lee,Jian-Liang Li,Matthias Lienhard,Alla Mikheenko,Dennis Mulligan,Ka Ming Nip,Mihaela Pertea,Matthew E. Ritchie,Andre D. Sim,Alison D. Tang,Yuk Kei Wan,Changqing Wang,Brandon Y. Wong,Chen Yang,If Barnes,Andrew E. Berry,Salvador Capella-Gutierrez,Alyssa Cousineau,Namrita Dhillon,Jose M. Fernandez-Gonzalez,Luis Ferrández-Peral,Natàlia Garcia-Reyero,Stefan Götz,Carles Hernández-Ferrer,Liudmyla Kondratova,Tianyuan Liu,Alessandra Martinez-Martin,Carlos Menor,Jorge Mestre-Tomás,Jonathan M. Mudge,Nedka G. Panayotova,Alejandro Paniagua,Dmitry Repchevsky,Xingjie Ren,Eric Rouchka,Brandon Saint-John,Enrique Sapena,Leon Sheynkman,Melissa Laird Smith,Marie-Marthe Suner,Hazuki Takahashi,Ingrid A. Youngworth,Piero Carninci,Nancy D. Denslow,Roderic Guigó,Margaret E. Hunter,Rene Maehr,Yin Shen,Hagen U. Tilgner,Barbara J. Wold,Christopher Vollmers,Adam Frankish,Kin Fai Au,Gloria M. Sheynkman,Ali Mortazavi,Ana Conesa,Angela N. Brooks

DOI: https://doi.org/10.1038/s41592-024-02298-3

IF: 48

2024-06-08

Nature Methods

Abstract:The Long-read RNA-Seq Genome Annotation Assessment Project Consortium was formed to evaluate the effectiveness of long-read approaches for transcriptome analysis. Using different protocols and sequencing platforms, the consortium generated over 427 million long-read sequences from complementary DNA and direct RNA datasets, encompassing human, mouse and manatee species. Developers utilized these data to address challenges in transcript isoform detection, quantification and de novo transcript detection. The study revealed that libraries with longer, more accurate sequences produce more accurate transcripts than those with increased read depth, whereas greater read depth improved quantification accuracy. In well-annotated genomes, tools based on reference sequences demonstrated the best performance. Incorporating additional orthogonal data and replicate samples is advised when aiming to detect rare and novel transcripts or using reference-free approaches. This collaborative study offers a benchmark for current practices and provides direction for future method development in transcriptome analysis.

biochemical research methods

What problem does this paper attempt to address?

This paper aims to systematically evaluate the effectiveness and accuracy of long - read RNA - sequencing (lrRNA - seq) methods in transcript identification and quantification. Specifically, the paper evaluates different long - read RNA - sequencing methods through the following three challenges: 1. **Challenge 1: Reconstruct full - length transcripts of high - quality genomes** - Evaluate the ability of different methods to detect known and novel transcripts in high - quality genomes. - Use multiple experimental methods and tools to compare the performance of different methods in detecting full - length transcripts. 2. **Challenge 2: Quantify transcript abundance** - Evaluate the accuracy and consistency of different methods in quantifying transcript expression levels. - Verify the quantitative results of different methods through multiple replicate samples and orthogonal data (such as short - read sequencing data). 3. **Challenge 3: De novo transcript reconstruction from species lacking high - quality reference genomes** - Evaluate the ability of different methods to detect and reconstruct transcripts in the absence of high - quality reference genomes. - Pay special attention to methods for detecting rare and novel transcripts in genomes with low annotation quality. Through these challenges, the paper aims to provide benchmarks for current transcriptome analysis methods and guide the development of future methods. The research results show that long - read sequencing technology has the potential in detecting full - length and novel transcripts, but still faces challenges in quantifying transcript abundance. In addition, the study also emphasizes the importance of combining orthogonal data and replicate samples, especially when detecting rare and novel transcripts.

Systematic assessment of long-read RNA-seq methods for transcript identification and quantification

Improving the Diversity of Captured Full-Length Isoforms Using a Normalized Single-Molecule RNA-sequencing Method

Systematic Assessment of Next Generation Sequencing for Quantitative Small RNA Profiling: a Multiple Protocol Study Across Multiple Laboratories

Comprehensive Multi-Center Assessment of Small RNA-seq Methods for Quantitative Mirna Profiling

Illuminating the dark side of the human transcriptome with long read transcript sequencing

Assembly Arena: Benchmarking RNA isoform reconstruction algorithms for nanopore sequencing

Comprehensive characterization of single-cell full-length isoforms in human and mouse with long-read sequencing

High-Resolution Transcriptome Analysis with Long-Read RNA Sequencing

Deep annotation of long noncoding RNAs by assembling RNA-seq and small RNA-seq data

Enhancing transcriptome expression quantification through accurate assignment of long RNA sequencing reads with TranSigner

Transcriptome variation in human tissues revealed by long-read sequencing

Comprehensive Assessment of Isoform Detection Methods for Third-Generation Sequencing Data

Quartet RNA Reference Materials and Ratio-Based Reference Datasets for Reliable Transcriptomic Profiling

Targeted DNA-seq and RNA-seq of Reference Samples with Short-read and Long-read Sequencing

Quality assessment and control of tissue specific RNA-seq libraries of Drosophila transgenic RNAi models

SQANTI: extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification

Evaluating methods for isolating total RNA and predicting the success of sequencing phylogenetically diverse plant transcriptomes

Enhanced recovery of single-cell RNA-sequencing reads for missing gene expression data

Enhancing novel isoform discovery: leveraging nanopore long-read sequencing and machine learning approaches

UNAGI: Yeast Transcriptome Reconstruction and Gene Discovery Using Nanopore Sequencing

Characterizing and Annotating the Genome Using RNA-seq Data