Abstract:Despite the wealth of single-cell multi-omics data, it remains challenging to predict the consequences of novel genetic and chemical perturbations in the human body. It requires knowledge of molecular interactions at all biological levels, encompassing disease models and humans. Current machine learning methods primarily establish statistical correlations between genotypes and phenotypes but struggle to identify physiologically significant causal factors, limiting their predictive power. Key challenges in predictive modeling include scarcity of labeled data, generalization across different domains, and disentangling causation from correlation. In light of recent advances in multi-omics data integration, we propose a new artificial intelligence (AI)-powered biology-inspired multi-scale modeling framework to tackle these issues. This framework will integrate multi-omics data across biological levels, organism hierarchies, and species to predict causal genotype-environment-phenotype relationships under various conditions. AI models inspired by biology may identify novel molecular targets, biomarkers, pharmaceutical agents, and personalized medicines for presently unmet medical needs.

What problem does this paper attempt to address?

This paper mainly discusses how to build a multiscale prediction model using artificial intelligence (AI)-driven multi-omics integration to understand and predict the causal relationships between genotypes, environments, and phenotypes. Current machine learning methods mainly establish statistical correlations between genotypes and phenotypes but struggle to identify physiologically meaningful causal factors, limiting their predictive power. The proposed framework aims to integrate multi-omics data across biological hierarchies, species, and conditions to predict phenotypic outcomes of humans under different environmental perturbations. Key challenges mentioned in the research include the scarcity of labeled data, cross-domain generalization, and difficulty in distinguishing causality from correlation. The latest advancements in multi-omics data, such as genomics, transcriptomics, and proteomics, provide possibilities to address these issues. These data reveal the molecular landscape of different internal phenotypes, serving as a bridge between genotypes and phenotypes. By integrating these data, a better understanding of the complexity and interdependence of biological systems can be achieved, and new drug targets, biomarkers, and personalized treatment strategies can be identified. The importance of single-cell and spatial omics technologies is also mentioned in the paper, as they enable us to observe and quantify heterogeneity and intercellular communication within tissues at single-cell resolution, linking molecular events to organismal phenotypes. Furthermore, by integrating multi-omics data across species, knowledge obtained from model systems can be translated to humans, advancing basic and translational biomedical science. The paper summarizes existing perturbation omics data resources and machine learning methods, including unsupervised and supervised learning, as well as the application of knowledge graphs and other technologies in multi-omics data integration and predictive modeling. Despite some progress being made, current methods still have limitations and require new AI frameworks to overcome these challenges, in order to enhance our understanding of complex human traits, genetic and molecular basis of diseases, and predict phenotypic responses under various genotypes and environmental influences.

AI-driven multi-omics integration for multi-scale predictive modeling of causal genotype-environment-phenotype relationships

Mixed Linear Model Approaches of Association Mapping for Complex Traits Based on Omics Variants

An Introduction to Causal Inference Methods with Multi-omics Data

A Cross-Level Information Transmission Network for Predicting Phenotype from New Genotype: Application to Cancer Precision Medicine

Integrative Analysis of Multi-omics Data for Discovery and Functional Studies of Complex Human Diseases

Integrate multi-omics data with biological interaction networks using Multi-view Factorization AutoEncoder (MAE)

Causal machine learning for single-cell genomics

MODILM: towards better complex diseases classification using a novel multi-omics data integration learning model

Interpretable meta-learning of multi-omics data for survival analysis and pathway enrichment

Machine learning for multi-omics data integration in cancer

Integrative machine learning approaches for predicting disease risk using multi-omics data from the UK Biobank

An Innovative Multi-Omics Model Integrating Latent Alignment and Attention Mechanism for Drug Response Prediction

Modeling causal signal propagation in multi-omic factor space with COSMOS

From classical mendelian randomization to causal networks for systematic integration of multi-omics

Multi-omics data integration by generative adversarial network

Parameters Identification for Motorcycle Simulator's Platform Characterization

Jewish and non-Jewish World War II child and adolescent survivors at 60 years after war: effects of parental loss and age at exposure on well-being.

Applications of multi‐omics analysis in human diseases

Integrated multi-omics with machine learning to uncover the intricacies of kidney disease

MUTATE: A Human Genetic Atlas of Multi-organ AI Endophenotypes using GWAS Summary Statistics

Genotype and Phenotype Association Analysis Based on Multi-omics Statistical Data