SPLSN: An efficient tool for survival analysis and biomarker selection

Hai‐Hui Huang,Xin‐Dong Peng,Yong Liang
DOI: https://doi.org/10.1002/int.22532
IF: 8.993
2021-06-13
International Journal of Intelligent Systems
Abstract:<p>In genome research, it is a fundamental issue to identify few but important survival-related biomarkers. The Cox model is a widely used survival analysis technique, which is used to study the relationship between characteristics and survival response. However, limitations of the existing Cox methods for genomic data are as follows: (1) a typical gene expression data set consists of tens of thousands of genes, and the result of current methods may not be sparse enough; (2) a wealth of structural information about many biological processes, such as regulatory networks and pathways, has often been ignored; (3) genomic data is usually considered as high noise, which is usually ignored in current methods. To alleviate the above problems, in this paper, we study a novel sparse Cox regression model, called SPLSN, which combines self-paced learning (SPL) and a log-sum absolute network-based penalty (Logsum-Net), especially for biomarker selection in survival analysis. SPL is embedded in curriculum design, and the model is trained by gradually increasing samples from low noise to high noise during the training process. The Logsum-Net encourages smoothness among the coefficients of adjacent genes on a specific biological network. We compare the proposed method with five alternative approaches in various experimental scenarios, including a comprehensive simulation, seven benchmark gene expression data sets, and one large validation data set. Results show that the SPLSN can identify fewer meaningful biomarkers and obtain the best or equivalent prediction performance. Moreover, the biological analysis shows that the genes selected by the SPLSN might be helpful to tumor treatment.</p>
computer science, artificial intelligence
What problem does this paper attempt to address?