Phylogenetic comparison and splice site conservation of eukaryotic U1 snRNP-specific U1-70K gene family

Tao Fan,Yu-Zhen Zhao,Jing-Fang Yang,Qin-Lai Liu,Yuan Tian,Das Debatosh,Ying-Gao Liu,Jianhua Zhang,Chen Chen,Mo-Xian Chen,Shao-Ming Zhou
DOI: https://doi.org/10.1038/s41598-021-91693-3
IF: 4.6
2021-06-17
Scientific Reports
Abstract:Abstract Eukaryotic cells can expand their coding ability by using their splicing machinery, spliceosome, to process precursor mRNA (pre-mRNA) into mature messenger RNA. The mega-macromolecular spliceosome contains multiple subcomplexes, referred to as small nuclear ribonucleoproteins (snRNPs). Among these, U1 snRNP and its central component, U1-70K, are crucial for splice site recognition during early spliceosome assembly. The human U1-70K has been linked to several types of human autoimmune and neurodegenerative diseases. However, its phylogenetic relationship has been seldom reported. To this end, we carried out a systemic analysis of 95 animal U1-70K genes and compare these proteins to their yeast and plant counterparts. Analysis of their gene and protein structures, expression patterns and splicing conservation suggest that animal U1-70Ks are conserved in their molecular function, and may play essential role in cancers and juvenile development. In particular, animal U1-70Ks display unique characteristics of single copy number and a splicing isoform with truncated C-terminal, suggesting the specific role of these U1-70Ks in animal kingdom. In summary, our results provide phylogenetic overview of U1-70K gene family in vertebrates. In silico analyses conducted in this work will act as a reference for future functional studies of this crucial U1 splicing factor in animal kingdom.
multidisciplinary sciences
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on the following aspects: 1. **Explore the phylogenetic relationships of the U1 - 70K gene family in animals**: By analyzing the U1 - 70K genes of 95 animals, study their evolutionary relationships among different species, especially in comparison with homologous genes in yeast and plants. 2. **Analyze the conservation of U1 - 70K gene and protein structures**: Through multiple - sequence alignment and domain prediction, explore the conservation and diversity of U1 - 70K proteins in different species, especially the conservation of their RNA recognition motifs (RRM) and U1snRNP70_N domains. 3. **Study the expression patterns and splicing conservation of the U1 - 70K gene**: By analyzing the transcriptional isoforms and splicing sites of the U1 - 70K gene in different species, explore their expression patterns and splicing conservation in different species. 4. **Explore the potential roles of U1 - 70K in diseases**: The paper mentions that U1 - 70K is related to various human autoimmune diseases and neurodegenerative diseases. Therefore, studying its conservation and functions in different species is helpful for understanding the molecular mechanisms of these diseases. Specifically, the paper uses the following methods to solve these problems: - **Sequence identification and alignment**: Use BLASTp search and HMMER software to screen out the U1 - 70K protein sequences of 95 animals and perform multiple - sequence alignment. - **Phylogenetic tree construction**: Use Muscle v3.8 for multiple - sequence alignment, use PhyML v3.0 to construct a maximum - likelihood phylogenetic tree, and use FigTree v1.4.3 for visualization. - **Protein domain and conserved motif analysis**: Predict protein domains through HMMER and analyze conserved motifs using the MEME tool. - **Homology modeling and amino acid conservation assessment**: Use the ConSurf server to assess the conservation of amino acids and use the Swiss - Model server for homology modeling. - **Protein - protein interaction network analysis**: Use the STRING database to construct a protein - interaction network. - **Genome organization and conserved motif analysis**: Download and reconstruct the exon - intron structures of each U1 - 70K gene from the Ensembl database and analyze their conserved motifs. - **Transcriptional isoform and conserved splicing site analysis**: Extract all available U1 - 70K transcriptional isoforms from the Ensembl database and analyze their splicing patterns and splicing sites. Through these methods, the paper aims to provide the phylogenetic background of the U1 - 70K gene family in animals and provide references for further functional studies.