Single-molecule Real-Time (SMRT) Sequencing Facilitates Tachypleus Tridentatus Genome Annotation
Fangrui Lou,Na Song,Zhiqiang Han,Tianxiang Gao
DOI: https://doi.org/10.1016/j.ijbiomac.2020.01.029
IF: 8.2
2020-01-01
International Journal of Biological Macromolecules
Abstract:Tachypleus tridentatus is a keystone species in marine ecosystems. Its hemolymph also provides the limulus amebocyte lysate (LAL) for detection of bacterial endotoxin in human medical service. Here we combined SMRT sequencing and Illumina RNA-seq to characterize the novel isoforms, novel genetic loci, fusion isoforms formation and transcriptome structure and further to unveil the transcriptome complexity of T. tridentatus. We identified 26,705 non-redundancy isoforms form 10,919 genetic loci, including 25,713 novel isoforms, 2403 novel genes and 170 fusion isoforms. In addition, 1578 novel genes and 23,172 novel isoforms were annotated in the NR, Pfam, KOG, COG, eggNOG, Swiss-Prot, KEGG and GO databases. Meanwhile, we have obtained 4671 gene family clustering based on genetic loci. Furthermore, there are 17,296, 4887, 1054, and 1435 APAs, AS events, lncRNAs, and TFs were identified in the T. tridentatus long-read transcriptome and the target genes of 1054 lncRNA sequences were also predicted. Overall, our work firstly provided the long-read transcriptome and these data are very necessary to improve the annotation information of T. tridentatus genome and optimize the boundaries of 12,342 original reference annotated genes. Furthermore, these information are a potential resource to study LAL secretion mechanisms in T. tridentatus.
What problem does this paper attempt to address?