Abstract:There is increasing interest in changing the emphasis of tumor classification from morphologic to molecular. Gene expression profiles may offer more information than morphology and provide an alternative to morphology-based tumor classification systems. Gene selection involves a search for gene subsets that are able to discriminate tumor tissue from normal tissue, and may have either clear biological interpretation or some implication in the molecular mechanism of the tumorigenesis. Gene selection is a fundamental issue in gene expression-based tumor classification. In the formation of a discriminant rule, the number of genes is large relative to the number of tissue samples. Too many genes can harm the performance of the tumor classification system and increase the cost as well. In this report, we discuss criteria and illustrate techniques for reducing the number of genes and selecting an optimal (or near optimal) subset of genes from an initial set of genes for tumor classification. The practical advantages of gene selection over other methods of reducing the dimensionality (e.g., principal components), include its simplicity, future cost savings, and higher likelihood of being adopted in a clinical setting. We analyze the expression profiles of 2000 genes in 22 normal and 40 colon tumor tissues, 5776 sequences in 14 human mammary epithelial cells and 13 breast tumors, and 6817 genes in 47 acute lymphoblastic leukemia and 25 acute myeloid leukemia samples. Through these three examples, we show that using 2 or 3 genes can achieve more than 90% accuracy of classification. This result implies that after initial investigation of tumor classification using microarrays, a small number of selected genes may be used as biomarkers for tumor classification, or may have some relevance in tumor development and serve as a potential drug target. In this report we also show that stepwise Fisher's linear discriminant function is a practicable method for gene expression-based tumor classification.

How Many Genes Are Needed for a Discriminant Microarray Data Analysis ?

Comment on `Quantum-Anti-Zeno Paradox'

Extreme Value Distribution Based Gene Selection Criteria for Discriminant Microarray Data Analysis Using Logistic Regression

A Generalized Approach for Measuring Relationships among Genes.

Feature (gene) Selection in Gene Expression-Based Tumor Classification

Algorithm for Finding Optimal Gene Sets in Microarray Prediction

Discriminant analysis to evaluate clustering of gene expression data

Gene Selection Algorithm Based on Correlation Analysis

Gene selection for cancer classification using a hybrid of univariate and multivariate feature selection methods

Gene Features Selection for Three-Class Disease Classification via Multiple Orthogonal Partial Least Square Discriminant Analysis and S-Plot Using Microarray Data

Model-Free Gene Selection Method by Considering Unbalanced Samples

Identifying differentially expressed genes in human acute leukemia and mouse brain microarray datasets utilizing QTModel

Gene selection and classification for cancer microarray data based on machine learning and similarity measures

The minimal number of genes needed to identify a tumor

Gene Selection for Cancer Classification using Support Vector Machines

Parameters Selection in Gene Selection Using Gaussian Kernel Support Vector Machines by Genetic Algorithm

Effect of training-sample size and classification difficulty on the accuracy of genomic predictors

Machine Learning Techniques To Identify Marker Genes For Diagnostic Classification Of Microarrays

Feature Selection and Classification of MAQC-II Breast Cancer and Multiple Myeloma Microarray Gene Expression Data

Pathway-based feature selection algorithms identify genes discriminating patients with multiple sclerosis apart from controls

Study of Informative Gene Selection for Gene Expression Profiles