Expressed Sequence Tags with Cdna Termini: Previously Overlooked Resources for Gene Annotation and Transcriptome Exploration in Chlamydomonas Reinhardtii

Chun Liang,Ynansheng Liu,Lin Liu,Adam C. Davis,Yingjia Shen,Qingshun Quinn Li
DOI: https://doi.org/10.1534/genetics.107.085605
IF: 4.402
2008-01-01
Genetics
Abstract:Many of Chlamydomonas reinhardtii expressed sequence tags (ESTs) in GenBank dbEST and community EST assemblies were either over- or undertrimmed in terms of their cDNA termini, which are defined as the diagnostic sequence elements that delineate 3'/5' ends of mRNA transcripts. Overtrimming represents a loss of directional, positional, and structural information of transcript ends whereas undertrimming causes unclean spurious sequences retained in ESTs that exert deleterious impacts on downstream EST-based applications. We examined 309,278 raw EST sequencing trace files of C. reinhardtii and found that only 57% had cDNA termini that matched the expected Structures specified in their cDNA library constructions while satisfying our minimum length requirement for their final clean sequences. Using GMAP, 156,963 individual ESTs were mapped to the genome successfully, with their in silico-verified cDNA termini anchored to the genome. Our data analysis Suggested strong macro- and microheterogeneity of 3'/5' end positions of individual transcripts derived from the same genes in C. reinhardtii. This work annotating differential ends of individual transcripts in the draft genome Presents the research community with a new stream of data that will facilitate accurate determination of gene Structures, genome annotation, and exploration of the transcriptome and mRNA metabolism in C. reinhardtii.
What problem does this paper attempt to address?