AI-driven multi-omics integration for multi-scale predictive modeling of causal genotype-environment-phenotype relationships

You Wu,Lei Xie
2024-07-09
Abstract:Despite the wealth of single-cell multi-omics data, it remains challenging to predict the consequences of novel genetic and chemical perturbations in the human body. It requires knowledge of molecular interactions at all biological levels, encompassing disease models and humans. Current machine learning methods primarily establish statistical correlations between genotypes and phenotypes but struggle to identify physiologically significant causal factors, limiting their predictive power. Key challenges in predictive modeling include scarcity of labeled data, generalization across different domains, and disentangling causation from correlation. In light of recent advances in multi-omics data integration, we propose a new artificial intelligence (AI)-powered biology-inspired multi-scale modeling framework to tackle these issues. This framework will integrate multi-omics data across biological levels, organism hierarchies, and species to predict causal genotype-environment-phenotype relationships under various conditions. AI models inspired by biology may identify novel molecular targets, biomarkers, pharmaceutical agents, and personalized medicines for presently unmet medical needs.
Artificial Intelligence
What problem does this paper attempt to address?
This paper mainly discusses how to build a multiscale prediction model using artificial intelligence (AI)-driven multi-omics integration to understand and predict the causal relationships between genotypes, environments, and phenotypes. Current machine learning methods mainly establish statistical correlations between genotypes and phenotypes but struggle to identify physiologically meaningful causal factors, limiting their predictive power. The proposed framework aims to integrate multi-omics data across biological hierarchies, species, and conditions to predict phenotypic outcomes of humans under different environmental perturbations. Key challenges mentioned in the research include the scarcity of labeled data, cross-domain generalization, and difficulty in distinguishing causality from correlation. The latest advancements in multi-omics data, such as genomics, transcriptomics, and proteomics, provide possibilities to address these issues. These data reveal the molecular landscape of different internal phenotypes, serving as a bridge between genotypes and phenotypes. By integrating these data, a better understanding of the complexity and interdependence of biological systems can be achieved, and new drug targets, biomarkers, and personalized treatment strategies can be identified. The importance of single-cell and spatial omics technologies is also mentioned in the paper, as they enable us to observe and quantify heterogeneity and intercellular communication within tissues at single-cell resolution, linking molecular events to organismal phenotypes. Furthermore, by integrating multi-omics data across species, knowledge obtained from model systems can be translated to humans, advancing basic and translational biomedical science. The paper summarizes existing perturbation omics data resources and machine learning methods, including unsupervised and supervised learning, as well as the application of knowledge graphs and other technologies in multi-omics data integration and predictive modeling. Despite some progress being made, current methods still have limitations and require new AI frameworks to overcome these challenges, in order to enhance our understanding of complex human traits, genetic and molecular basis of diseases, and predict phenotypic responses under various genotypes and environmental influences.