Comprehensive genome annotation of the model ciliate Tetrahymena thermophila by in-depth epigenetic and transcriptomic profiling
Fei Ye,Xiao Chen,Aili Ju,Yalan Sheng,Lili Duan,Khaled A. S. Al-Rasheid,Naomi A. Stover,Shan Gao
DOI: https://doi.org/10.1101/2024.01.31.578305
2024-01-01
Abstract:The ciliate Tetrahymena thermophila is a well-established unicellular model eukaryote, contributing significantly to foundational biological discoveries. Despite its acknowledged importance, current Tetrahymena biology studies face challenges due to gene annotation inaccuracy, particularly the notable absence of untranslated regions (UTRs). To comprehensively annotate the Tetrahymena macronuclear genome, we collected extensive transcriptomic data spanning various cell stages. To ascertain transcript orientation and transcription start/end sites, we incorporated data of epigenetic marks displaying enrichment towards the 5’ end of gene bodies, including H3 lysine 4 tri-methylation (H3K4me3), H2A.Z, nucleosomes, and N6-methyldeoxyadenine (6mA). Additionally, we integrated Nanopore direct sequencing (DRS), strand-specific RNA-seq, and ATAC-seq data. Using a newly-developed bioinformatic pipeline, coupled with manual curation and experimental validation, our work yielded substantial improvements to the current gene models, including the addition of 2,481 new genes, updates to 6,257 existing genes, and the incorporation of 5,917 alternatively spliced isoforms. Furthermore, novel UTR information was annotated for 26,223 high-confidence genes. Intriguingly, 16% of protein-coding genes were identified to have natural antisense transcripts (NATs) characterized by high diversity in alternative splicing, thus offering insights into understanding transcriptional regulation. Our work will enhance the utility of Tetrahymena as a robust genetic toolkit for advancing biological research.
### Competing Interest Statement
The authors have declared no competing interest.