Identification and analysis of mouse non-coding RNA using transcriptome data
Yuhui Zhao,Wanfei Liu,Jingyao Zeng,Shoucheng Liu,Xinyu Tan,Hasanawad Aljohi,Songnian Hu
DOI: https://doi.org/10.1007/s11427-015-4929-x
Abstract:Transcripts are expressed spatially and temporally and they are very complicated, precise and specific; however, most studies are focused on protein-coding related genes. Recently, massively parallel cDNA sequencing (RNA-seq) has emerged to be a new and promising tool for transcriptome research, and numbers of non-coding RNAs, especially lincRNAs, have been widely identified and well characterized as important regulators of diverse biological processes. In this study, we used ultra-deep RNA-seq data from 15 mouse tissues to study the diversity and dynamic of non-coding RNAs in mouse. Using our own criteria, we identified totally 16,249 non-coding genes (21,569 non-coding RNAs) in mouse. We annotated these non-coding RNAs by diverse properties and found non-coding RNAs are generally shorter, have fewer exons, express in lower level and are more strikingly tissue-specific compared with protein-coding genes. Moreover, these non-coding RNAs show significant enrichment with transcriptional initiation and elongation signals including histone modifications (H3K4me3, H3K27me3 and H3K36me3), RNAPII binding sites and CAGE tags. The gene set enrichment analysis (GSEA) result revealed several sets of lincRNAs associated with diverse biological processes such as immune effector process, muscle development and sexual reproduction. Taken together, this study provides a more comprehensive annotation of mouse non-coding RNAs and gives an opportunity for future functional and evolutionary study of mouse non-coding RNAs.