Multi-omics network-based functional annotation of unknown Arabidopsis genes

Thomas Depuydt,Klaas Vandepoele
DOI: https://doi.org/10.1101/2021.06.17.448819
2021-06-17
Abstract:Summary Unraveling gene functions is pivotal to understand the signaling cascades controlling plant development and stress responses. Given that experimental profiling is costly and labor intensive, the need for high-confidence computational annotations is evident. In contrast to detailed gene-specific functional information, transcriptomics data is widely available in both model and crop species. Here, we developed a novel automated function prediction (AFP) algorithm, leveraging complementary information present in multiple expression datasets through the analysis of study-specific gene co-expression networks. Benchmarking the prediction performance on recently characterized Arabidopsis thaliana genes, we showed that our method outperforms state-of-the-art expression-based approaches. Next, we predicted biological process annotations for known (n=15,790) and unknown (n=11,865) genes in A. thaliana and validated our predictions using experimental protein-DNA and protein-protein interaction data (covering >220 thousand interactions in total), obtaining a set of high-confidence functional annotations. 5,054 (42.6%) unknown genes were assigned at least one validated annotation, and 3,408 (53.0%) genes with only computational annotations gained at least one novel validated function. These omics-supported functional annotations shed light on a variety of developmental processes and molecular responses, such as flower and root development, defense responses to fungi and bacteria, and phytohormone signaling, and help alleviate the knowledge gap of biological process annotations in Arabidopsis. An in-depth analysis of two context-specific networks, modeling seed development and response to water deprivation, shows how previously uncharacterized genes function within the respective networks. Moreover, our AFP approach can be applied in future studies to facilitate gene discovery for crop improvement. Significance statement For the majority of plant genes, it is unknown in which processes they are involved. Using a multi-omics approach, leveraging transcriptome, protein-DNA and protein-protein interaction data, we functionally annotated 42.6% of unknown Arabidopsis thaliana genes, providing insight into a variety of developmental processes and molecular responses, as well as a resource of annotations which can be explored by the community to facilitate future research.
What problem does this paper attempt to address?