RNA-seq data science: From raw data to effective interpretation
Dhrithi Deshpande,Karishma Chhugani,Yutong Chang,Aaron Karlsberg,Caitlin Loeffler,Jinyang Zhang,Agata Muszynska,Jeremy Rotman,Laura Tao,Brunilda Balliu,Elizabeth Tseng,Eleazar Eskin,Fangqing Zhao,Pejman Mohammadi,Pawel P Labaj,Serghei Mangul
DOI: https://doi.org/10.48550/arXiv.2010.02391
2021-02-17
Abstract:RNA-sequencing (RNA-seq) has become an exemplar technology in modern biology and clinical applications over the past decade. It has gained immense popularity in the recent years driven by continuous efforts of the bioinformatics community to develop accurate and scalable computational tools. RNA-seq is a method of analyzing the RNA content of a sample using the modern sequencing platforms. It generates enormous amounts of transcriptomic data in the form of nucleotide sequences, known as reads. RNA-seq analysis enables the probing of genes and corresponding transcripts which is essential for answering important biological questions, such as detecting novel exons, transcripts, gene expressions, and studying alternative splicing structure. However, obtaining meaningful biological signals from raw data using computational methods is challenging due to the limitations of modern sequencing technologies. The need to leverage these technological challenges have pushed the rapid development of many novel computational tools which have evolved and diversified in accordance with technological advancements, leading to the current myriad population of RNA-seq tools. Our review provides a systemic overview of RNA-seq technology and 235 available RNA-seq tools across various domains published from 2008 to 2020, discussing the interdisciplinary nature of bioinformatics involved in RNA sequencing, analysis, and software development.
Genomics