Full-length Transcriptome Sequencing on PacBio Platform
REN YiPeng,ZHANG JiaQing,SUN Yu,WU ZhenFeng,RUAN JiShou,HE BingJun,LIU GuoQing,GAO Shan,BU WenJun
DOI: https://doi.org/10.1360/n972015-01384
2016-01-01
Chinese Science Bulletin (Chinese Version)
Abstract:The Next Generation Sequencing (NGS) technology,particularly the Illumina platform now has produced most of the animal and plant transcriptomes,but the short reads from NGS sequencers result in incompletely assembled transcripts which are lack of some important information (e.g.alternative splicing).This limits better understanding of transcriptome data.Based on the single-molecule real-time (SMRT) sequencing technology,the PacBio platform can provide longer and even full-length transcripts that originate from observations of single molecules without assembly.The full-length transcripts can be used to investigate alternative splicing,alternative polyadenylation,novel genes,non-coding RNAs and fusion transcripts,et al.Until the end of 2015,transcriptomes of a few species have been sequenced using the PacBio platform.They are classfied into three groups.The first group includes human lymphoblastoid and Salvia miltiorrhiza using a combination of NGS short reads and SMRT technology.The second group includes HIV-1,bovine immunoglobulin G,human embryonic stem cells,mouse neurexins and Propithecus coquereli using SMRT.The third group includes european cuttlefish,tetraploid cotton and fungi using SMRT with the latest PacBio full-length transcriptome data analysis pipeline IsoSeq.The use of SMARTer PCR cDNA Synthesis Kit and the IsoSeq data analysis pipeline was recommended to facilitate full-length transcriptome sequencing.However,the transcriptome data quality could be affected by ribosomal RNA contamination,cross-contamination on agarose gel,the effect of size selection using gel or BluePippin,prevalence of PCR chimera products and the wrong removal of SMRT bell adapters.Although IsoSeq can remove artificial concatemers that are produced due to insufficient SMRT bell amount during the sequencing library preparation step,some problems still exists.For example,IsoSeq can not distinguish PCR chimeras from true fusion genes.Another critical problem is the misidentification of 5'and 3'primers due to sequencing errors or partial trimming of them as the SMRT bell adapters.This could provide the wrong strand information of transcripts for further analysis.In addtion,transcripts of the same gene are difficult to be clustered without the genome guide.Therefore,it is necessary to standardize the experiment and data analysis protocols and design quality control measures of the full-length transcriptome sequencing technology for its application in a large scale.In this study,we sequenced the first full-length insect transcriptome using the Erthesina fullo Thunberg as material.Seven SMRT cells on PacBio RS Ⅱ sequencer were used to produce 381,394 reads with 16,262 bp average size.Totally 6 Gbp effective data was used for further analysis on the optimization of experimental parameters,design of quality control measures and standardization of protocols using the new PacBio reagents (P6/C4).Some of results in this study were reported to provide useful information to help better understanding the full-length transcriptome sequencing technology and designing experiments.