Comprehensive genome annotation of the model ciliate by in-depth epigenetic and transcriptomic profiling

Fei Ye,Xiao Chen,Aili Ju,Yalan Sheng,Lili Duan,Khaled A. S. Al-Rasheid,Naomi A. Stover,Shan Gao
DOI: https://doi.org/10.1101/2024.01.31.578305
2024-02-01
Abstract:The ciliate is a well-established unicellular model eukaryote, contributing significantly to foundational biological discoveries. Despite its acknowledged importance, current biology studies face challenges due to gene annotation inaccuracy, particularly the notable absence of untranslated regions (UTRs). To comprehensively annotate the macronuclear genome, we collected extensive transcriptomic data spanning various cell stages. To ascertain transcript orientation and transcription start/end sites, we incorporated data of epigenetic marks displaying enrichment towards the 5’ end of gene bodies, including H3 lysine 4 tri-methylation (H3K4me3), H2A.Z, nucleosomes, and N -methyldeoxyadenine (6mA). Additionally, we integrated Nanopore direct sequencing (DRS), strand-specific RNA-seq, and ATAC-seq data. Using a newly-developed bioinformatic pipeline, coupled with manual curation and experimental validation, our work yielded substantial improvements to the current gene models, including the addition of 2,481 new genes, updates to 6,257 existing genes, and the incorporation of 5,917 alternatively spliced isoforms. Furthermore, novel UTR information was annotated for 26,223 high-confidence genes. Intriguingly, 16% of protein-coding genes were identified to have natural antisense transcripts (NATs) characterized by high diversity in alternative splicing, thus offering insights into understanding transcriptional regulation. Our work will enhance the utility of as a robust genetic toolkit for advancing biological research.
Bioinformatics
What problem does this paper attempt to address?