Abstract:Cancer classification plays an important role in cancer treatment. There has been no general approach for this problem now. The tasks for cancer classification are of two aspects: identifying new cancer classes and assigning tumors to known classes, which are called class discovery and class prediction by Golub et al. [1]. From mathematical point of view, class discovery is a cluster analysis problem, while class prediction is usually called classification problem (we’ll use the later name to keep consist with pattern recognition literatures). Until now, cancer classification has been based primarily on morphological appearance of tumor [1]. This has serious limitations because of ambiguity. Golub et al. presented a new approach to cancer classification based on gene expression monitoring by DNA microarrays in [1]. They chose acute leukemia as a test case, and the target is to distinguish between ALL (acute lymphoblastic leukemia) and AML (acute myeloid leukemia), which is a typical cancer classification problem not well solved despite many years of efforts. This paper is a report of our work on the classification (prediction) part of this problem following their original work. Golub et al. adopted a feature selection (gene selection) procedure before classification. A metric was defined to evaluate the correlation of each gene to the classification. After some “good” genes were selected from all the 6817 genes, the classification is done by a weighted voting scheme. The classifier was trained on a 38-sample training set, and another 34-sample set was used for testing. With leave-one-out cross-validation on the training set with 50 selected genes, 36 out of 38 samples were correctly classified and 2 were rejected (no-call). The performance on the test set was that 29 samples out of 34 were correctly classified and the other 5 were rejected. If the classifier were compelled to give these 5 no-calls a prediction, the prediction would be wrong. Since the feature selection procedure is of single selection type, and the classification method is also an intuitive one, we believe that there is still much space for the performance to be improved. In our approach to the problem, we took all the genes for the classification (the selection problem will be discussed in another paper), and applied the support vector machine(SVM) method and one of its improved version CSVM as the classifier. Thanks to the better generalization ability of SVM and CSVM, much better performance was obtained.

Cancer Subtype Recognition and Feature Selection with Gene Expression Profiles

Gene Selection for Leukemia Subtype Classification from Gene Expression Profile

Subtype Dependent Biomarker Identification and Tumor Classification from Gene Expression Profiles.

Selection of Feature Genes in Cancer Clsssification

Using feature selection and Bayesian network identify cancer subtypes based on proteomic data

Multiclass Cancer Classification by Using Fuzzy Support Vector Machine and Binary Decision Tree with Gene Selection

Identifying and Analyzing Different Cancer Subtypes Using RNA-seq Data of Blood Platelets.

Feature (gene) Selection in Gene Expression-Based Tumor Classification

Informative Gene Selection for Cancer Subtype Classification with BP Neural Networks

A Feature Selection Method for Colon Tumor Based on Gene Expression Profiles

Neighborhood Rough Set Model Based Gene Selection for Multi-subtype Tumor Classification

Feature selection for cancer classification based on support vector machine

Analysis of Gene Expression Profiles of Lung Cancer Subtypes with Machine Learning Algorithms.

Feature Selection for Cancer Classification Based on Fuzzy Rough Sets

ALL/AML Cancer Classification by Gene Expression Data Using SVM and CSVM Approach

A Novel Feature Selection Method Based on CFS in Cancer Recognition

An Efficient Feature Selection Strategy Based on Multiple Support Vector Machine Technology with Gene Expression Data

Gene selection for cancer classification using a hybrid of univariate and multivariate feature selection methods

FEATURE SELECTION FOR CLUSTERING DISEASE SAMPLES BASED ON GENE ONTOLOGY

An Effective Gene Selection Method for Cancer Subtype Classification Based on Predatory Search Genetic Algorithm and Support Vector Machine

The Classification of Tumor Using Gene Expression Profile Based on Support Vector Machines and Factor Analysis.