Abstract:Evaluation of survival models to predict cancer patient prognosis is one of the most important areas of emphasis in cancer research. A binary classification approach has difficulty directly predicting survival due to the characteristics of censored observations and the fact that the predictive power depends on the threshold used to set two classes. In contrast, the traditional Cox regression approach has some drawbacks in the sense that it does not allow for the identification of interactions between genomic features, which could have key roles associated with cancer prognosis. In addition, data integration is regarded as one of the important issues in improving the predictive power of survival models since cancer could be caused by multiple alterations through meta-dimensional genomic data including genome, epigenome, transcriptome, and proteome. Here we have proposed a new integrative framework designed to perform these three functions simultaneously: (1) predicting censored survival data; (2) integrating meta-dimensional omics data; (3) identifying interactions within/between meta-dimensional genomic features associated with survival. In order to predict censored survival time, martingale residuals were calculated as a new continuous outcome and a new fitness function used by the grammatical evolution neural network (GENN) based on mean absolute difference of martingale residuals was implemented. To test the utility of the proposed framework, a simulation study was conducted, followed by an analysis of meta-dimensional omics data including copy number, gene expression, DNA methylation, and protein expression data in breast cancer retrieved from The Cancer Genome Atlas (TCGA). On the basis of the results from breast cancer dataset, we were able to identify interactions not only within a single dimension of genomic data but also between meta-dimensional omics data that are associated with survival. Notably, the predictive power of our best meta-dimensional model was 73% which outperformed all of the other models conducted based on a single dimension of genomic data. Breast cancer is an extremely heterogeneous disease and the high levels of genomic diversity within/between breast tumors could affect the risk of therapeutic responses and disease progression. Thus, identifying interactions within/between meta-dimensional omics data associated with survival in breast cancer is expected to deliver direction for improved meta-dimensional prognostic biomarkers and therapeutic targets.

StableMate: a statistical method to select stable predictors in omics data

Case-Based Meta-Prediction for Bioinformatics.

Selecting Reliable Mrna Expression Measurements Across Platforms Improves Downstream Analysis

Stabilizing Variable Selection and Regression

Robust biomarker screening from gene expression data by stable machine learning-recursive feature elimination methods

Utilizing stability criteria in choosing feature selection methods yields reproducible results in microbiome data

A robust kernel machine regression towards biomarker selection in multi-omics datasets of osteoporosis for drug discovery

Unbiased Prediction and Feature Selection in High-Dimensional Survival Regression

Coupling bootstrap with synergy self-organizing map-based orthogonal partial least squares discriminant analysis: Stable metabolic biomarker selection for inherited metabolic diseases

biosigner: A New Method for the Discovery of Significant Molecular Signatures from Omics Data

Discovery of sparse, reliable omic biomarkers with Stabl

Stability Scad: A Powerful Approach to Detect Interactions in Large-Scale Genomic Study

Analysis and Prediction of Protein Stability Based on Interaction Network, Gene Ontology, and KEGG Pathway Enrichment Scores.

Stabilized marker gene identification and functional annotation from single-cell transcriptomic data

BioM2: biologically informed multi-stage machine learning for phenotype prediction using omics data

Stable feature selection based on probability estimation in gene expression datasets

Predicting censored survival data based on the interactions between meta-dimensional omics data in breast cancer

Statistical batch-aware embedded integration, dimension reduction and alignment for spatial transcriptomics

Development and Validation of Predictive Molecular Signatures

A stability-driven protocol for drug response interpretable prediction (staDRIP)

On the combination of omics data for prediction of binary outcomes