Abstract:Background: Computational models in medical research which use molecular features to predict patient sensitivities or outcomes are traditionally limited to producing a scalar output for a given disease phenotype, e.g. progression-free survival or drug response. Even for relatively intelligible models, it is generally difficult to place confidence limits on the prediction or develop an intuition for how a prediction varies with respect to "what-if" changes in those features or how sensitive a prediction is to changes in input variables. Moreover, while essential for understanding and modeling tumor behavior, molecular features from patient tumors in real-world settings are typically limited to profiles of hundreds of genes, while the biologically relevant inputs to the patient's condition can be much larger. Approach: To generate data required to enhance the interpretation of predictive models and extend the utility of real-world patient oncology datasets, a distribution of conditions consistent with the observed features and their incidence across cancer patient cohorts is required. We hypothesized we could synthesize patient samples drawn from the joint probability distribution of a broader universe of features when a subset of them is held fixed at the observed values. We model this joint distribution by learning a Bayesian network over a broad feature set for which some training data is available and generate feature profiles by applying Gibbs sampling to the learned network. Results: We assessed the potential clinical utility of using these generated feature profiles in predicting drug response and enhancing biological interpretation of different model outputs. We found generative somatic mutations were useful for predicting cancer patient outcomes, including drug response predictions in breast cancer tumors from AACR Project GENIE. Synthesized patient feature profiles enhanced biological interpretability of nominal panel data (100-200 genes), providing a means to assign probabilities to outcomes and uncovering previously described mechanisms of drug response in real-world patient data. Conclusion: We introduce an approach that leverages Bayesian networks to synthesize richly annotated patient feature profiles from limited molecular data routinely collected in real-world settings. This approach addresses challenges associated with limited molecular data, biological interpretability and evaluation of predictive models and enhances the utility of real-world datasets for cancer research. Citation Format: Dillon H. Tracy, Jeff Sherman, Maayan Baron. Generative Bayesian networks for augmentation of molecular data from commercial genetics panels [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular s); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl) nr 7373.

ClOneHORT: Approaches for Improved Fidelity in Generative Models of Synthetic Genomes

Latent generative modeling of long genetic sequences with GANs

Generating Synthetic Genotypes using Diffusion Models

On Utility and Privacy in Synthetic Genomic Data

Genome-AC-GAN: Enhancing Synthetic Genotype Generation through Auxiliary Classification

Defining Loci in Restriction-Based Reduced Representation Genomic Data from Nonmodel Species: Sources of Bias and Diagnostics for Optimal Clustering

GenoHoption: Bridging Gene Network Graphs and Single-Cell Foundation Models

Privacy-hardened and hallucination-resistant synthetic data generation with logic-solvers

Pangenome-Informed Language Models for Privacy-Preserving Synthetic Genome Sequence Generation

Generating Multi-Modal and Multi-Attribute Single-Cell Counts with CFGen

DeepGene: An Efficient Foundation Model for Genomics based on Pan-genome Graph Transformer

Accurate and General DNA Representations Emerge from Genome Foundation Models at Scale

Deep Generative Modeling and Clustering of Single Cell Hi -C Data

Reconstruction of Diploid High-Order 3D Genome Interactions from Long Noisy Concatemers

Limitations and Enhancements in Genomic Language Models: Dynamic Selection Approach

Agile Genetics: Single gene resolution without the fuss

Scalable DNA Feature Generation and Transcription Factor Binding Prediction via Deep Surrogate Models

Absorb & Escape: Overcoming Single Model Limitations in Generating Genomic Sequences

Aligning Synthetic Medical Images with Clinical Knowledge using Human Feedback

In silico generation of synthetic cancer genomes using generative AI

Abstract 7373: Generative Bayesian networks for augmentation of molecular data from commercial genetics panels