Harnessing the predicted maize pan-interactome for putative gene function prediction and prioritization of candidate genes for important traits

Elly Poretsky,Halise Busra Cagirici,Carson M Andorf,Taner Z Sen
DOI: https://doi.org/10.1093/g3journal/jkae059
2024-03-16
Abstract:Abstract The recent assembly and annotation of the 26 maize nested association mapping population founder inbreds have enabled large-scale pan-genomic comparative studies. These studies have expanded our understanding of agronomically important traits by integrating pan-transcriptomic data with trait-specific gene candidates from previous association mapping results. In contrast to the availability of pan-transcriptomic data, obtaining reliable protein–protein interaction (PPI) data has remained a challenge due to its high cost and complexity. We generated predicted PPI networks for each of the 26 genomes using the established STRING database. The individual genome-interactomes were then integrated to generate core- and pan-interactomes. We deployed the PPI clustering algorithm ClusterONE to identify numerous PPI clusters that were functionally annotated using gene ontology (GO) functional enrichment, demonstrating a diverse range of enriched GO terms across different clusters. Additional cluster annotations were generated by integrating gene coexpression data and gene description annotations, providing additional useful information. We show that the functionally annotated PPI clusters establish a useful framework for protein function prediction and prioritization of candidate genes of interest. Our study not only provides a comprehensive resource of predicted PPI networks for 26 maize genomes but also offers annotated interactome clusters for predicting protein functions and prioritizing gene candidates. The source code for the Python implementation of the analysis workflow and a standalone web application for accessing the analysis results are available at https://github.com/eporetsky/PanPPI.
genetics & heredity
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on the following aspects: 1. **Gene function prediction**: By generating and analyzing the predicted protein - protein interaction networks (PPI) of 26 genomes of maize (*Zea mays*), this study aims to improve the ability to predict the functions of maize genes. Due to the high cost and technical difficulty of high - throughput protein - protein interaction experiments, researchers use existing databases (such as the STRING database) to predict these interactions, thereby constructing the pan - interactome and core - interactome of maize. 2. **Candidate gene prioritization**: The study not only provides the predicted PPI network resources of 26 maize genomes but also predicts protein functions through functionally annotated PPI clusters and gives priority to candidate genes related to specific traits. This helps to identify causal genes that may affect important agronomic traits such as plant architecture, height, flowering time, grain weight, and the abundance of different metabolites. 3. **Integrating multi - omics data**: In order to better understand gene functions and their roles in complex biological systems, the study integrates gene co - expression data and gene description annotations, providing additional useful information. This integration method not only helps to predict protein functions but also can be used to verify the regulatory roles of proteins in complex signaling pathways. Specifically, the study solves the above problems through the following steps: - **Generating predicted PPI networks**: The PPI networks of 26 maize genomes are predicted using the STRING database. - **Constructing pan - interactome and core - interactome**: The PPI networks of 26 individual genomes are mapped to unified pan - gene IDs to generate the pan - interactome and core - interactome. - **PPI network clustering**: The PPI networks are clustered using the ClusterONE algorithm to generate functionally annotated PPI clusters. - **Functional annotation and enrichment analysis**: Functional annotation of PPI clusters is carried out through GO term enrichment analysis and gene co - expression data, improving the interpretability of the network. - **Application examples**: By searching for GO terms related to flowering time, it is shown how to use the functional annotation of PPI clusters to infer potential gene functions and prioritize candidate genes. In conclusion, this study provides a comprehensive framework for generating and analyzing the predicted PPI networks of maize, thereby improving gene function prediction and candidate gene prioritization, and providing a powerful tool for crop improvement and understanding of complex biological systems.