Investigating the Evolution of Green Algae with a Large Transcriptomic Dataset

David A. Ferranti,Charles F. Delwiche
DOI: https://doi.org/10.1101/2024.02.21.581324
2024-02-22
Abstract:The colonization of land by plants, thought to have occurred approximately 450-500 million years ago (Ma) is one of the most important events in the history of life on Earth. Land plants, hereafter referred to as “embryophytes,” comprise the foundation of every terrestrial biome, making them an essential lineage for the origin and maintenance of biodiversity. The embryophytes form a monophyletic clade within one of the two major phyla of the green algae, the Streptophyta. Estimates from fossil data and molecular clock analyses suggest the charophytes diverged from the other main phylum of green algae, the Chlorophyta, as much as 1500 Ma. Here we present a phylogenetic analysis using transcriptomic and genomic data of 62 green algae and embryophyte operational taxonomic units, 31 of which were assembled for this project. We focus on identifying the charophyte lineage that is sister to embryophytes, and show that the Zygnematophyceae have the strongest support, followed by the Charophyceae. We demonstrate that this phylogenetic tree topology is robust across different phylogenetic models and methods. Furthermore, we examine amino acid and codon usage across the tree and find these data broadly follow the phylogenetic tree. We conclude by searching the dataset for several protein domains and gene families known to be important in embryophytes, including the ethylene signaling pathway and various ion transporters. Many of these domains and genes have homologous sequences in the charophyte lineages, giving insight into the processes that underlay the colonization of the land by plants.
Evolutionary Biology
What problem does this paper attempt to address?
The paper attempts to address the problem of identifying the green algae lineage most closely related to land plants (i.e., "embryophytes"). Specifically, the authors analyzed a large-scale transcriptome dataset of 62 operational taxonomic units of green algae and embryophytes, focusing on identifying the green algae lineage most closely related to embryophytes. The study found that the Zygnematophyceae is the strongest candidate, followed by the Charophyceae. Additionally, the authors explored amino acid and codon usage patterns among different lineages and searched for several protein domains and gene families important in embryophytes to understand how these features support the evolution of land plants. ### Main Research Questions: 1. **Identify the green algae lineage most closely related to land plants (embryophytes)**: Identify the green algae lineage most closely related to embryophytes through large-scale transcriptome and genome data analysis. 2. **Verify the consistency of different phylogenetic models and methods**: Ensure the consistency and robustness of the constructed phylogenetic tree under different models and methods. 3. **Explore amino acid and codon usage patterns**: Analyze amino acid and codon usage patterns among different lineages to understand their evolutionary characteristics. 4. **Search for key protein domains and gene families**: Search for protein domains and gene families related to important biological pathways in embryophytes, particularly the ethylene signaling pathway and various ion transport proteins. ### Research Background: - **Origin of land plants**: The emergence of land plants approximately 450 to 500 million years ago is one of the most significant events in the history of life on Earth, having a profound impact on ecosystems and biodiversity. - **Classification of green algae**: Green algae are divided into two major clades, the Streptophyta and the Chlorophyta. The Streptophyta includes land plants and their close relatives, while the Chlorophyta encompasses most of the green algae diversity. - **Key issue**: Identifying the green algae lineage most closely related to land plants is crucial for understanding how land plants transitioned from aquatic to terrestrial environments. ### Research Methods: - **Data generation**: Generate RNA-Seq datasets for 32 green algae species using high-throughput sequencing technology, followed by assembly and filtering. - **Homologous gene identification**: Identify homologous genes using the hidden Markov model (HMM) method and construct multiple sequence alignments. - **Phylogenetic analysis**: Construct phylogenetic trees using the maximum likelihood method and verify their robustness through various models and methods. - **Amino acid and codon usage analysis**: Study amino acid and codon usage patterns among different lineages through principal component analysis (PCA). - **Protein domain and gene family search**: Search for protein domains and gene families related to the ethylene signaling pathway and ion transport proteins. ### Research Results: - **Phylogenetic tree**: The results indicate that the Zygnematophyceae is the green algae lineage most closely related to land plants, followed by the Charophyceae. - **Amino acid and codon usage**: The amino acid and codon usage patterns among different lineages are generally consistent with the phylogenetic tree. - **Protein domains and gene families**: Many protein domains and gene families related to the ethylene signaling pathway and ion transport proteins have homologous sequences in the Zygnematophyceae, providing clues for further research on the evolution of land plants. Through these studies, the authors hope to better understand the origin of land plants and the molecular mechanisms of their adaptation to terrestrial environments.