Abstract:Background: Computational models in medical research which use molecular features to predict patient sensitivities or outcomes are traditionally limited to producing a scalar output for a given disease phenotype, e.g. progression-free survival or drug response. Even for relatively intelligible models, it is generally difficult to place confidence limits on the prediction or develop an intuition for how a prediction varies with respect to "what-if" changes in those features or how sensitive a prediction is to changes in input variables. Moreover, while essential for understanding and modeling tumor behavior, molecular features from patient tumors in real-world settings are typically limited to profiles of hundreds of genes, while the biologically relevant inputs to the patient's condition can be much larger. Approach: To generate data required to enhance the interpretation of predictive models and extend the utility of real-world patient oncology datasets, a distribution of conditions consistent with the observed features and their incidence across cancer patient cohorts is required. We hypothesized we could synthesize patient samples drawn from the joint probability distribution of a broader universe of features when a subset of them is held fixed at the observed values. We model this joint distribution by learning a Bayesian network over a broad feature set for which some training data is available and generate feature profiles by applying Gibbs sampling to the learned network. Results: We assessed the potential clinical utility of using these generated feature profiles in predicting drug response and enhancing biological interpretation of different model outputs. We found generative somatic mutations were useful for predicting cancer patient outcomes, including drug response predictions in breast cancer tumors from AACR Project GENIE. Synthesized patient feature profiles enhanced biological interpretability of nominal panel data (100-200 genes), providing a means to assign probabilities to outcomes and uncovering previously described mechanisms of drug response in real-world patient data. Conclusion: We introduce an approach that leverages Bayesian networks to synthesize richly annotated patient feature profiles from limited molecular data routinely collected in real-world settings. This approach addresses challenges associated with limited molecular data, biological interpretability and evaluation of predictive models and enhances the utility of real-world datasets for cancer research. Citation Format: Dillon H. Tracy, Jeff Sherman, Maayan Baron. Generative Bayesian networks for augmentation of molecular data from commercial genetics panels [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular s); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl) nr 7373.

Multinomial belief networks for healthcare data

Conditional Hierarchical Bayesian Tucker Decomposition for Genetic Data Analysis

Abstract 7373: Generative Bayesian networks for augmentation of molecular data from commercial genetics panels

Bayesian Structure Learning in Multi-layered Genomic Networks

Bayesian modeling of mutual exclusivity in cancer mutations

Topical hidden genome: discovering latent cancer mutational topics using a Bayesian multilevel context-learning approach

Seeing Unseen: Discover Novel Biomedical Concepts via Geometry-Constrained Probabilistic Modeling

Modeling Cumulative Biological Phenomena with Suppes-Bayes Causal Networks

Model based clustering of multinomial count data

A Full Bayesian Approach to Sparse Network Inference Using Heterogeneous Datasets

CGBayesNets: Conditional Gaussian Bayesian Network Learning and Inference with Mixed Discrete and Continuous Data

Discovering biomedical causality by a generative Bayesian causal network under uncertainty

Bayesian Structural Learning with Parametric Marginals for Count Data: An Application to Microbiota Systems

Detecting Disease-Associated Genomic Outcomes Using Constrained Mixture of Bayesian Hierarchical Models for Paired Data.

Bayesian network-driven clustering analysis with feature selection for high-dimensional multi-modal molecular data

Non-Gaussian Normative Modelling With Hierarchical Bayesian Regression

A Graphical Model for Fusing Diverse Microbiome Data

Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-t distributions

Combining Probability and Nonprobability Samples by Using Multivariate Mass Imputation Approaches with Application to Biomedical Research

Bayesian Deep Generative Models for Replicated Networks with Multiscale Overlapping Clusters

Bayesian graphical models for computational network biology