Enhanced bovine genome annotation through integration of transcriptomics and epi-transcriptomics datasets facilitates genomic biology

Hamid Beiki,Brenda M Murdoch,Carissa A Park,Chandlar Kern,Denise Kontechy,Gabrielle Becker,Gonzalo Rincon,Honglin Jiang,Huaijun Zhou,Jacob Thorne,James E Koltes,Jennifer J Michal,Kimberly Davenport,Monique Rijnkels,Pablo J Ross,Rui Hu,Sarah Corum,Stephanie McKay,Timothy P L Smith,Wansheng Liu,Wenzhi Ma,Xiaohui Zhang,Xiaoqing Xu,Xuelei Han,Zhihua Jiang,Zhi-Liang Hu,James M Reecy
DOI: https://doi.org/10.1093/gigascience/giae019
IF: 7.658
2024-04-17
GigaScience
Abstract:Background The accurate identification of the functional elements in the bovine genome is a fundamental requirement for high-quality analysis of data informing both genome biology and genomic selection. Functional annotation of the bovine genome was performed to identify a more complete catalog of transcript isoforms across bovine tissues. Results A total of 160,820 unique transcripts (50% protein coding) representing 34,882 unique genes (60% protein coding) were identified across tissues. Among them, 118,563 transcripts (73% of the total) were structurally validated by independent datasets (PacBio isoform sequencing data, Oxford Nanopore Technologies sequencing data, de novo assembled transcripts from RNA sequencing data) and comparison with Ensembl and NCBI gene sets. In addition, all transcripts were supported by extensive data from different technologies such as whole transcriptome termini site sequencing, RNA Annotation and Mapping of Promoters for the Analysis of Gene Expression, chromatin immunoprecipitation sequencing, and assay for transposase-accessible chromatin using sequencing. A large proportion of identified transcripts (69%) were unannotated, of which 86% were produced by annotated genes and 14% by unannotated genes. A median of two 5′ untranslated regions were expressed per gene. Around 50% of protein-coding genes in each tissue were bifunctional and transcribed both coding and noncoding isoforms. Furthermore, we identified 3,744 genes that functioned as noncoding genes in fetal tissues but as protein-coding genes in adult tissues. Our new bovine genome annotation extended more than 11,000 annotated gene borders compared to Ensembl or NCBI annotations. The resulting bovine transcriptome was integrated with publicly available quantitative trait loci data to study tissue–tissue interconnection involved in different traits and construct the first bovine trait similarity network. Conclusions These validated results show significant improvement over current bovine genome annotations.
multidisciplinary sciences
What problem does this paper attempt to address?
The paper attempts to address the issue of improving the accuracy of functional annotation of the bovine genome. Specifically, the researchers aim to identify a more complete catalog of bovine genome transcript isoforms by integrating transcriptomics and epitranscriptomics data, thereby better understanding genome biology and genomic selection. The main objectives of the paper are as follows: 1. **Increase transcriptome complexity**: By integrating multiple datasets (such as PacBio long-read sequencing, Oxford Nanopore technology sequencing, RNA sequencing, etc.), enhance the complexity of the bovine genome transcriptome to be comparable to the highly annotated human genome. 2. **Improve gene annotation**: Focus not only on protein-coding genes but also improve the annotation of non-coding RNAs (including long non-coding RNAs, small non-coding RNAs, etc.) and miRNA genes. 3. **Integrate multi-omics data**: Combine transcriptome data with publicly available quantitative trait loci (QTL) and gene association data to study interactions between different tissues and construct the first bovine trait similarity network. 4. **Validate results**: Validate the predicted transcripts and gene structures through multiple independent datasets and techniques (such as full-length transcript end sequencing, promoter analysis, chromatin immunoprecipitation sequencing, transposase-accessible chromatin sequencing, etc.). In summary, the goal of this study is to improve the quality of functional annotation of the bovine genome by integrating various data and techniques, thereby better understanding the complex mechanisms in genome biology and genomic selection.