Abstract:There is increasing interest in changing the emphasis of tumor classification from morphologic to molecular. Gene expression profiles may offer more information than morphology and provide an alternative to morphology-based tumor classification systems. Gene selection involves a search for gene subsets that are able to discriminate tumor tissue from normal tissue, and may have either clear biological interpretation or some implication in the molecular mechanism of the tumorigenesis. Gene selection is a fundamental issue in gene expression-based tumor classification. In the formation of a discriminant rule, the number of genes is large relative to the number of tissue samples. Too many genes can harm the performance of the tumor classification system and increase the cost as well. In this report, we discuss criteria and illustrate techniques for reducing the number of genes and selecting an optimal (or near optimal) subset of genes from an initial set of genes for tumor classification. The practical advantages of gene selection over other methods of reducing the dimensionality (e.g., principal components), include its simplicity, future cost savings, and higher likelihood of being adopted in a clinical setting. We analyze the expression profiles of 2000 genes in 22 normal and 40 colon tumor tissues, 5776 sequences in 14 human mammary epithelial cells and 13 breast tumors, and 6817 genes in 47 acute lymphoblastic leukemia and 25 acute myeloid leukemia samples. Through these three examples, we show that using 2 or 3 genes can achieve more than 90% accuracy of classification. This result implies that after initial investigation of tumor classification using microarrays, a small number of selected genes may be used as biomarkers for tumor classification, or may have some relevance in tumor development and serve as a potential drug target. In this report we also show that stepwise Fisher's linear discriminant function is a practicable method for gene expression-based tumor classification.

Gene Selection for Sample Classification Based on Gene Expression Data: Study of Sensitivity to Choice of Parameters of the Ga/Knn Method

Gene selection and sample classification on microarray data based on adaptive genetic algorithm/k-nearest neighbor method

Parameters Selection in Gene Selection Using Gaussian Kernel Support Vector Machines by Genetic Algorithm

Gene Selection Algorithm Based on Correlation Analysis

A Cancer Gene Selection Algorithm Based on the K-S Test and CFS

Gene selection for cancer classification using a hybrid of univariate and multivariate feature selection methods

Gene Selection for Cancer Classification using Support Vector Machines

Multiclass Cancer Classification by Using Fuzzy Support Vector Machine and Binary Decision Tree with Gene Selection

Gene selection using independent variable group analysis for tumor classification

Feature (gene) Selection in Gene Expression-Based Tumor Classification

Gene selection and classification for cancer microarray data based on machine learning and similarity measures

Bi-level gene selection of cancer by combining clustering and sparse learning

Model-Free Gene Selection Method by Considering Unbalanced Samples

Variable Selection in Logistic Regression Model with Genetic Algorithm.

Gene Selection Using Gaussian Kernel Support Vector Machine Based Recursive Feature Elimination with Adaptive Kernel Width Strategy

A two-stage gene selection scheme utilizing MRMR filter and GA wrapper

A multi-population χ2 test approach to informative gene selection

Gene selection for cancer identification: a decision tree model empowered by particle swarm optimization algorithm

Non-parametric statistical tests for informative gene selection

Signature Genes Selection and Functional Analysis of Phenotypes: A Comparative Study

Gene selection and cancer classification using Monte Carlo and nonnegative matrix factorization