A semi-supervised deep learning method based on stacked sparse auto-encoder for cancer prediction using RNA-seq data

Yawen Xiao,Jun Wu,Zongli Lin,Xiaodong Zhao
DOI: https://doi.org/10.1016/j.cmpb.2018.10.004
IF: 6.1
2018-11-01
Computer Methods and Programs in Biomedicine
Abstract:BACKGROUND AND OBJECTIVE: Cancer has become a complex health problem due to its high mortality. Over the past few decades, with the rapid development of the high-throughput sequencing technology and the application of various machine learning methods, remarkable progress in cancer research has been made based on gene expression data. At the same time, a growing amount of high-dimensional data has been generated, such as RNA-seq data, which calls for superior machine learning methods able to deal with mass data effectively in order to make accurate treatment decision.METHODS: In this paper, we present a semi-supervised deep learning strategy, the stacked sparse auto-encoder (SSAE) based classification, for cancer prediction using RNA-seq data. The proposed SSAE based method employs the greedy layer-wise pre-training and a sparsity penalty term to help capture and extract important information from the high-dimensional data and then classify the samples.RESULTS: We tested the proposed SSAE model on three public RNA-seq data sets of three types of cancers and compared the prediction performance with several commonly-used classification methods. The results indicate that our approach outperforms the other methods for all the three cancer data sets in various metrics.CONCLUSIONS: The proposed SSAE based semi-supervised deep learning model shows its promising ability to process high-dimensional gene expression data and is proved to be effective and accurate for cancer prediction.
engineering, biomedical,computer science, interdisciplinary applications,medical informatics, theory & methods
What problem does this paper attempt to address?