Abstract:BACKGROUND:Molecular measurements of the genome, the transcriptome, and the epigenome, often termed multi-omics data, provide an in-depth view on biological systems and their integration is crucial for gaining insights in complex regulatory processes. These data can be used to explain disease related genetic variants by linking them to intermediate molecular traits (quantitative trait loci, QTL). Molecular networks regulating cellular processes leave footprints in QTL results as so-called trans-QTL hotspots. Reconstructing these networks is a complex endeavor and use of biological prior information can improve network inference. However, previous efforts were limited in the types of priors used or have only been applied to model systems. In this study, we reconstruct the regulatory networks underlying trans-QTL hotspots using human cohort data and data-driven prior information.METHODS:We devised a new strategy to integrate QTL with human population scale multi-omics data. State-of-the art network inference methods including BDgraph and glasso were applied to these data. Comprehensive prior information to guide network inference was manually curated from large-scale biological databases. The inference approach was extensively benchmarked using simulated data and cross-cohort replication analyses. Best performing methods were subsequently applied to real-world human cohort data.RESULTS:Our benchmarks showed that prior-based strategies outperform methods without prior information in simulated data and show better replication across datasets. Application of our approach to human cohort data highlighted two novel regulatory networks related to schizophrenia and lean body mass for which we generated novel functional hypotheses.CONCLUSIONS:We demonstrate that existing biological knowledge can improve the integrative analysis of networks underlying trans associations and generate novel hypotheses about regulatory mechanisms.

Fourier-transform-based attribution priors improve the interpretability and stability of deep learning models for genomics

Principled feature attribution for unsupervised gene expression analysis

Improving performance of deep learning models with axiomatic attribution priors and expected gradients

Biophysical models of cis-regulation as interpretable neural networks

MFABA: A More Faithful and Accelerated Boundary-based Attribution Method for Deep Neural Networks

Understanding the Limitations of Deep Models for Molecular Property Prediction: Insights and Solutions.

Network Reconstruction for Trans Acting Genetic Loci Using Multi-Omics Data and Prior Information

Explainable Fragment-Based Molecular Property Attribution

Incorporating Biological Knowledge with Factor Graph Neural Network for Interpretable Deep Learning

Using Attribution to Decode Dataset Bias in Neural Network Models for Chemistry

The Nucleotide Transformer: Building and Evaluating Robust Foundation Models for Human Genomics

Enhancing Personalized Gene Expression Prediction From DNA Sequences Using Genomic Foundation Models

Incorporating graph information in Bayesian factor analysis with robust and adaptive shrinkage priors

A mechanistically interpretable neural network for regulatory genomics

Benchmarking DNA Foundation Models for Genomic Sequence Classification

A variational autoencoder trained with priors from canonical pathways increases the interpretability of transcriptome data

Multimodal learning of noncoding variant effects using genome sequence and chromatin structure

Scalable DNA Feature Generation and Transcription Factor Binding Prediction via Deep Surrogate Models

Visual Interpretable and Explainable Deep Learning Models for Brain Tumor MRI and COVID-19 Chest X-ray Images

Accurate and General DNA Representations Emerge from Genome Foundation Models at Scale