Comparison of Different Classification Methods for Breast Cancer Subtypes Prediction

Jing Xu,Peng Wu,Yuehui Chen,Li Zhang
DOI: https://doi.org/10.1109/SPAC46244.2018.8965553
2018-01-01
Abstract:Breast cancer is one of the most common cancers among women. Due to heterogeneity of cancers, breast cancer is divided into different subtypes. Different subtypes have different molecular genesis, so the corresponding target cells and treatment plans are different. Identifying the correct cancer subtypes is important for cancer diagnosis and prognosis. Breast cancer subtypes can be divided into four types: Basal, Her2, Luminal A, and Luminal B. Many machine learning approaches have been applied to cancer subtypes classification in the past few years, we present a comparison of different classifiers K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Multi-Layer Perception (MLP), and Multi-Grained Cascade Forest (gcForest) on The Cancer Genome Atlas (TCGA) databases of breast cancer. As we all know, biological data are high-dimensional and have small sample size, so before classification, we use subtype dependent feature selection method to reduce dimensionality of RNA-Seq gene expression data. Experimental results show that gcForest has a higher accuracy rate for breast cancer subtypes prediction compared with other classifiers.
What problem does this paper attempt to address?