Abstract:The linking genotype to phenotype is the fundamental aim of modern genetics. We focus on study of links between gene expression data and phenotype data through integrative analysis. We propose three approaches. 1) The inherent complexity of phenotypes makes high-throughput phenotype profiling a very difficult and laborious process. We propose a method of automated multi-dimensional profiling which uses gene expression similarity. Large-scale analysis show that our method can provide robust profiling that reveals different phenotypic aspects of samples. This profiling technique is also capable of interpolation and extrapolation beyond the phenotype information given in training data. It can be used in many applications, including facilitating experimental design and detecting confounding factors. 2) Phenotype association analysis problems are complicated by small sample size and high dimensionality. Consequently, phenotype-associated gene subsets obtained from training data are very sensitive to selection of training samples, and the constructed sample phenotype classifiers tend to have poor generalization properties. To eliminate these obstacles, we propose a novel approach that generates sequences of increasingly discriminative gene cluster combinations. Our experiments on both simulated and real datasets show robust and accurate classification performance. 3) Many complex phenotypes, such as cancer, are the product of not only gene expression, but also gene interaction. We propose an integrative approach to find gene network modules that activate under different phenotype conditions. Using our method, we discovered cancer subtype-specific network modules, as well as the ways in which these modules coordinate. In particular, we detected a breast-cancer specific tumor suppressor network module with a hub gene, PDGFRL, which may play an important role in this module.

Generation of comprehensible hypotheses from gene expression data

A Comparative Analysis of Gene Expression Profiling by Statistical and Machine Learning Approaches

Advancing regulatory genomics with machine learning

A Machine Learning Approach to Simulate Gene Expression and Infer Gene Regulatory Networks

Studying Limits of Explainability by Integrated Gradients for Gene Expression Models

Computational Approaches for Disease Gene Identification

Learning the Regulatory Code of Gene Expression

A unified computational model for revealing and predicting subtle subtypes of cancers

Integrative analysis of gene expression and phenotype data

GENet: A Graph-Based Model Leveraging Histone Marks and Transcription Factors for Enhanced Gene Expression Prediction

Exploring Prognostic Gene Factors in Breast Cancer via Machine Learning

A Metastatic Cancer Expression Generator (MetGen): A Generative Contrastive Learning Framework for Metastatic Cancer Generation

Graph Based Link Prediction between Human Phenotypes and Genes

A Graph Informed Framework Empowering Gene Pathway Discovery and Gene Expression-Assisted Disease Classification

Convergent learning-based model for leukemia classification from gene expression

Prediction of a Gene Regulatory Network from Gene Expression Profiles With Linear Regression and Pearson Correlation Coefficient

A network-based machine-learning framework to identify both functional modules and disease genes

A Graph-Informed Modeling Framework Empowering Gene Pathway Discovery

Opening the Black Box: Interpretable Machine Learning for Geneticists.

Semi-Supervised Prediction of Gene Regulatory Networks Using Machine Learning Algorithms

DeepChrome: deep-learning for predicting gene expression from histone modifications