Abstract:Cancer classification plays an important role in cancer treatment. There has been no general approach for this problem now. The tasks for cancer classification are of two aspects: identifying new cancer classes and assigning tumors to known classes, which are called class discovery and class prediction by Golub et al. [1]. From mathematical point of view, class discovery is a cluster analysis problem, while class prediction is usually called classification problem (we’ll use the later name to keep consist with pattern recognition literatures). Until now, cancer classification has been based primarily on morphological appearance of tumor [1]. This has serious limitations because of ambiguity. Golub et al. presented a new approach to cancer classification based on gene expression monitoring by DNA microarrays in [1]. They chose acute leukemia as a test case, and the target is to distinguish between ALL (acute lymphoblastic leukemia) and AML (acute myeloid leukemia), which is a typical cancer classification problem not well solved despite many years of efforts. This paper is a report of our work on the classification (prediction) part of this problem following their original work. Golub et al. adopted a feature selection (gene selection) procedure before classification. A metric was defined to evaluate the correlation of each gene to the classification. After some “good” genes were selected from all the 6817 genes, the classification is done by a weighted voting scheme. The classifier was trained on a 38-sample training set, and another 34-sample set was used for testing. With leave-one-out cross-validation on the training set with 50 selected genes, 36 out of 38 samples were correctly classified and 2 were rejected (no-call). The performance on the test set was that 29 samples out of 34 were correctly classified and the other 5 were rejected. If the classifier were compelled to give these 5 no-calls a prediction, the prediction would be wrong. Since the feature selection procedure is of single selection type, and the classification method is also an intuitive one, we believe that there is still much space for the performance to be improved. In our approach to the problem, we took all the genes for the classification (the selection problem will be discussed in another paper), and applied the support vector machine(SVM) method and one of its improved version CSVM as the classifier. Thanks to the better generalization ability of SVM and CSVM, much better performance was obtained.

Cancer detection with various classification models: A comprehensive feature analysis using HMM to extract a nucleotide pattern

A Kernelized Classification Approach for Cancer Recognition Using Markovian Analysis of DNA Structure Patterns as Feature Mining

Multiclass Cancer Classification by Using Fuzzy Support Vector Machine and Binary Decision Tree with Gene Selection

Comparative Study of Cancer Classification by Analysis of RNA-seq Gene Expression Levels

Ensemble Classification Model With CFS-IGWO-Based Feature Selection for Cancer Detection Using Microarray Data

Cancer classification based on multiple dimensions: SNV patterns

Cancer Classification Utilizing Voting Classifier with Ensemble Feature Selection Method and Transcriptomic Data

Efficient Classification of Hallmark of Cancer Using Embedding-Based Support Vector Machine for Multilabel Text

deep DNA machine learning model to classify the tumor genome of patients with tumor sequencing

The efficacy of various machine learning models for multi-class classification of RNA-seq expression data

Cancer classification and pathway discovery using non-negative matrix factorization

Classification of human cancer diseases by gene expression profiles

A Highly Discriminative Hybrid Feature Selection Algorithm for Cancer Diagnosis

Genetic Clustering Algorithm-Based Feature Selection and Divergent Random Forest for Multiclass Cancer Classification Using Gene Expression Data

ALL/AML Cancer Classification by Gene Expression Data Using SVM and CSVM Approach

Implementation of ensemble machine learning algorithms on exome datasets for predicting early diagnosis of cancers

Using the "Hidden" Genome to Improve Classification of Cancer Types

DNA-framework-based multidimensional molecular classifiers for cancer diagnosis

Gene Selection for Cancer Classification using Support Vector Machines

Diagnostic classification based on DNA methylation profiles using sequential machine learning approaches.

Multimodal Dimension Reduction and Subtype Classification of Head and Neck Squamous Cell Tumors