Integrate Heterogeneous NGS and TGS Data to Boost Genome-free Transcriptome Research

Yangmei Qin,Zhe Lin,Dan Shi,Mindong Zhong,Te An,Linshan Chen,Yiquan Wang,Fan Lin,Guang Li,Zhi-Liang Ji
DOI: https://doi.org/10.1101/2020.05.27.117796
2020-01-01
Abstract:It is a long-term challenge to undertake reliable transcriptomic research under different circumstances of genome availability. Here, we newly developed a genome-free computational method to aid accurate transcriptome assembly, using the amphioxus as the example. Via integrating ten next generation sequencing (NGS) transcriptome datasets and one third-generation sequencing (TGS) dataset, we built a sequence library of non-redundant expressed transcripts for the amphioxus. The library consisted of overall 91,915 distinct transcripts, 51,549 protein-coding transcripts, and 16,923 novel extragenic transcripts. This substantially improved current amphioxus genome annotation by expanding the distinct gene number from 21,954 to 38,777. We consolidated the library significantly outperformed the genome, as well as method, in transcriptome assembly from multiple aspects. For convenience, we curated the Integrative Transcript Library database of the amphioxus (). In summary, this work provides a practical solution for most organisms to alleviate the heavy dependence on good quality genome in transcriptome research. It also ensures the amphioxus transcriptome research grounding on reliable data.
What problem does this paper attempt to address?