Systematic assessment of long-read RNA-seq methods for transcript identification and quantification

Francisco J. Pardo-Palacios,Dingjie Wang,Fairlie Reese,Mark Diekhans,Sílvia Carbonell-Sala,Brian Williams,Jane E. Loveland,Maite De María,Matthew S. Adams,Gabriela Balderrama-Gutierrez,Amit K. Behera,Jose M. Gonzalez Martinez,Toby Hunt,Julien Lagarde,Cindy E. Liang,Haoran Li,Marcus Jerryd Meade,David A. Moraga Amador,Andrey D. Prjibelski,Inanc Birol,Hamed Bostan,Ashley M. Brooks,Muhammed Hasan Çelik,Ying Chen,Mei R. M. Du,Colette Felton,Jonathan Göke,Saber Hafezqorani,Ralf Herwig,Hideya Kawaji,Joseph Lee,Jian-Liang Li,Matthias Lienhard,Alla Mikheenko,Dennis Mulligan,Ka Ming Nip,Mihaela Pertea,Matthew E. Ritchie,Andre D. Sim,Alison D. Tang,Yuk Kei Wan,Changqing Wang,Brandon Y. Wong,Chen Yang,If Barnes,Andrew E. Berry,Salvador Capella-Gutierrez,Alyssa Cousineau,Namrita Dhillon,Jose M. Fernandez-Gonzalez,Luis Ferrández-Peral,Natàlia Garcia-Reyero,Stefan Götz,Carles Hernández-Ferrer,Liudmyla Kondratova,Tianyuan Liu,Alessandra Martinez-Martin,Carlos Menor,Jorge Mestre-Tomás,Jonathan M. Mudge,Nedka G. Panayotova,Alejandro Paniagua,Dmitry Repchevsky,Xingjie Ren,Eric Rouchka,Brandon Saint-John,Enrique Sapena,Leon Sheynkman,Melissa Laird Smith,Marie-Marthe Suner,Hazuki Takahashi,Ingrid A. Youngworth,Piero Carninci,Nancy D. Denslow,Roderic Guigó,Margaret E. Hunter,Rene Maehr,Yin Shen,Hagen U. Tilgner,Barbara J. Wold,Christopher Vollmers,Adam Frankish,Kin Fai Au,Gloria M. Sheynkman,Ali Mortazavi,Ana Conesa,Angela N. Brooks
DOI: https://doi.org/10.1038/s41592-024-02298-3
IF: 48
2024-06-08
Nature Methods
Abstract:The Long-read RNA-Seq Genome Annotation Assessment Project Consortium was formed to evaluate the effectiveness of long-read approaches for transcriptome analysis. Using different protocols and sequencing platforms, the consortium generated over 427 million long-read sequences from complementary DNA and direct RNA datasets, encompassing human, mouse and manatee species. Developers utilized these data to address challenges in transcript isoform detection, quantification and de novo transcript detection. The study revealed that libraries with longer, more accurate sequences produce more accurate transcripts than those with increased read depth, whereas greater read depth improved quantification accuracy. In well-annotated genomes, tools based on reference sequences demonstrated the best performance. Incorporating additional orthogonal data and replicate samples is advised when aiming to detect rare and novel transcripts or using reference-free approaches. This collaborative study offers a benchmark for current practices and provides direction for future method development in transcriptome analysis.
biochemical research methods
What problem does this paper attempt to address?
This paper aims to systematically evaluate the effectiveness and accuracy of long - read RNA - sequencing (lrRNA - seq) methods in transcript identification and quantification. Specifically, the paper evaluates different long - read RNA - sequencing methods through the following three challenges: 1. **Challenge 1: Reconstruct full - length transcripts of high - quality genomes** - Evaluate the ability of different methods to detect known and novel transcripts in high - quality genomes. - Use multiple experimental methods and tools to compare the performance of different methods in detecting full - length transcripts. 2. **Challenge 2: Quantify transcript abundance** - Evaluate the accuracy and consistency of different methods in quantifying transcript expression levels. - Verify the quantitative results of different methods through multiple replicate samples and orthogonal data (such as short - read sequencing data). 3. **Challenge 3: De novo transcript reconstruction from species lacking high - quality reference genomes** - Evaluate the ability of different methods to detect and reconstruct transcripts in the absence of high - quality reference genomes. - Pay special attention to methods for detecting rare and novel transcripts in genomes with low annotation quality. Through these challenges, the paper aims to provide benchmarks for current transcriptome analysis methods and guide the development of future methods. The research results show that long - read sequencing technology has the potential in detecting full - length and novel transcripts, but still faces challenges in quantifying transcript abundance. In addition, the study also emphasizes the importance of combining orthogonal data and replicate samples, especially when detecting rare and novel transcripts.