Genome-wide analysis of mouse transcripts using exon microarrays and factor graphs

Brendan J Frey,Naveed Mohammad,Quaid D Morris,Wen Zhang,Mark D Robinson,Sanie Mnaimneh,Richard Chang,Qun Pan,Eric Sat,Janet Rossant,Benoit G Bruneau,Jane E Aubin,Benjamin J Blencowe,Timothy R Hughes
DOI: https://doi.org/10.1038/ng1630
IF: 30.8
2005-01-01
Nature Genetics
Abstract:Recent mammalian microarray experiments detected widespread transcription and indicated that there may be many undiscovered multiple-exon protein-coding genes. To explore this possibility, we labeled cDNA from unamplified, polyadenylation-selected RNA samples from 37 mouse tissues to microarrays encompassing 1.14 million exon probes. We analyzed these data using GenRate, a Bayesian algorithm that uses a genome-wide scoring function in a factor graph to infer genes. At a stringent exon false detection rate of 2.7%, GenRate detected 12,145 gene-length transcripts and confirmed 81% of the 10,000 most highly expressed known genes. Notably, our analysis showed that most of the 155,839 exons detected by GenRate were associated with known genes, providing microarray-based evidence that most multiple-exon genes have already been identified. GenRate also detected tens of thousands of potential new exons and reconciled discrepancies in current cDNA databases by 'stitching' new transcribed regions into previously annotated genes.
What problem does this paper attempt to address?