Abstract:Multi-biomarker panels can capture the nonlinear synergy among biomarkers and they are important to aid in the early diagnosis and ultimately battle complex diseases. However, identification of these multi-biomarker panels from case and control data is challenging. For example, the exhaustive search method is computationally infeasible when the data dimension is high. Here, we propose a novel method, MILP_k, to identify serum-based multi-biomarker panel to distinguish colorectal cancers (CRC) from benign colorectal tumors. Specifically, the multi-biomarker panel detection problem is modeled by a mixed integer programming to maximize the classification accuracy. Then we measured the serum profiling data for 101 CRC patients and 95 benign patients. The 61 biomarkers were analyzed individually and further their combinations by our method. We discovered 4 biomarkers as the optimal small multi-biomarker panel, including known CRC biomarkers CEA and IL-10 as well as novel biomarkers IMA and NSE. This multi-biomarker panel obtains leave-one-out cross-validation (LOOCV) accuracy to 0.7857 by nearest centroid classifier. An independent test of this panel by support vector machine (SVM) with threefold cross validation gets an AUC 0.8438. This greatly improves the predictive accuracy by 20% over the single best biomarker. Further extension of this 4-biomarker panel to a larger 13-biomarker panel improves the LOOCV to 0.8673 with independent AUC 0.8437. Comparison with the exhaustive search method shows that our method dramatically reduces the searching time by 1000-fold. Experiments on the early cancer stage samples reveal two panel of biomarkers and show promising accuracy. The proposed method allows us to select the subset of biomarkers with best accuracy to distinguish case and control samples given the number of selected biomarkers. Both receiver operating characteristic curve and precision-recall curve show our method's consistent performance gain in accuracy. Our method also shows its advantage in capturing synergy among selected biomarkers. The multi-biomarker panel far outperforms the simple combination of best single features. Close investigation of the multi-biomarker panel illustrates that our method possesses the ability to remove redundancy and reveals complementary biomarker combinations. In addition, our method is efficient and can select multi-biomarker panel with more than 5 biomarkers, for which the exhaustive methods fail. In conclusion, we propose a promising model to improve the clinical data interpretability and to serve as a useful tool for other complex disease studies. Our small multi-biomarker panel, CEA, IL-10, IMA, and NSE, may provide insights on the disease status of colorectal diseases. The implementation of our method in MATLAB is available via the website: http://doc.aporc.org/wiki/MILP_k.

Issues and solutions in biomarker evaluation when subclasses are involved under binary classification

Assessment of Multiple-Biomarker Classifiers: fundamental principles and a proposed strategy

Hybrid Design Evaluating New Biomarkers when There is an Existing Screening Test

Critical Assessment of the Biomarker Discovery and Classification Methods for Multiclass Metabolomics.

Biomarker-Driven Oncology Trial Design and Subgroup Characterization: Challenges and Potential Solutions

Bayesian Solutions for Assessing Differential Effects in Biomarker Positive and Negative Subgroups

A novel mixed integer programming for multi-biomarker panel identification by distinguishing malignant from benign colorectal tumors

Combining Biomarkers to Improve Diagnostic Accuracy in Detecting Diseases With Group‐Tested Data

Test for Incremental Value of New Biomarkers Based on OR Rules

Nonparametric Biomarker Based Treatment Selection With Reproducibility Data

Transformed ROC Curve for Biomarker Evaluation

Logistic regression analysis with standardized markers

Selection and combination of biomarkers using ROC method for disease classification and prediction

A pervasive review on biomarker in cervical intraepithelial lesions and carcinoma

An improved binary particle swarm optimization algorithm for clinical cancer biomarker identification in microarray data

Good Practice Guidelines for Biomarker Discovery from Array Data: a Case Study for Breast Cancer Prognosis

Multicategory Survival Outcomes Classification via Overlapping Group Screening Process Based on Multinomial Logistic Regression Model With Application to TCGA Transcriptomic Data

Two-Stage Adaptive Enrichment Designs with Survival Outcomes and Adjustment for Misclassification in Predictive Biomarkers

An efficient approach for identifying important biomarkers for biomedical diagnosis

Toward a holistic view of multiscale breast cancer molecular biomarkers

NetAUC: A network-based multi-biomarker identification method by AUC optimization