Abstract:There is increasing interest in changing the emphasis of tumor classification from morphologic to molecular. Gene expression profiles may offer more information than morphology and provide an alternative to morphology-based tumor classification systems. Gene selection involves a search for gene subsets that are able to discriminate tumor tissue from normal tissue, and may have either clear biological interpretation or some implication in the molecular mechanism of the tumorigenesis. Gene selection is a fundamental issue in gene expression-based tumor classification. In the formation of a discriminant rule, the number of genes is large relative to the number of tissue samples. Too many genes can harm the performance of the tumor classification system and increase the cost as well. In this report, we discuss criteria and illustrate techniques for reducing the number of genes and selecting an optimal (or near optimal) subset of genes from an initial set of genes for tumor classification. The practical advantages of gene selection over other methods of reducing the dimensionality (e.g., principal components), include its simplicity, future cost savings, and higher likelihood of being adopted in a clinical setting. We analyze the expression profiles of 2000 genes in 22 normal and 40 colon tumor tissues, 5776 sequences in 14 human mammary epithelial cells and 13 breast tumors, and 6817 genes in 47 acute lymphoblastic leukemia and 25 acute myeloid leukemia samples. Through these three examples, we show that using 2 or 3 genes can achieve more than 90% accuracy of classification. This result implies that after initial investigation of tumor classification using microarrays, a small number of selected genes may be used as biomarkers for tumor classification, or may have some relevance in tumor development and serve as a potential drug target. In this report we also show that stepwise Fisher's linear discriminant function is a practicable method for gene expression-based tumor classification.

Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data

Nim: A Node Influence Based Method for Cancer Classification

Comparative Study of Cancer Classification by Analysis of RNA-seq Gene Expression Levels

Classification of human cancer diseases by gene expression profiles

Multiclass Cancer Classification by Using Fuzzy Support Vector Machine and Binary Decision Tree with Gene Selection

A Comparative Analysis of Gene Expression Profiling by Statistical and Machine Learning Approaches

Clustering cancer gene expression data: a comparative study

Multiclass cancer diagnosis using tumor gene expression signatures

Feature (gene) Selection in Gene Expression-Based Tumor Classification

Cancer prediction with gene expression profiling and differential evolution

Applying the Deep Learning Techniques to Solve Classification Tasks Using Gene Expression Data

Deep-Learning-Based Cancer Profiles Classification Using Gene Expression Data Profile

Gene selection and classification for cancer microarray data based on machine learning and similarity measures

Machine Learning Methods for Cancer Classification Using Gene Expression Data: A Review

Cancer Classification Using Entropy Analysis in Fractional Fourier Domain of Gene Expression Profile

Multiclass Decision Forest - A Novel Pattern Recognition Method For Multiclass Classification In Microarray Data Analysis

Class prediction of an independent sample using a set of gene modules consisting of gene-pairs which were condition(Tumor, Normal) specific

Deep learning techniques for cancer classification using microarray gene expression data

Comparison of the classifiers based on mRNA, microRNA and lncRNA expression and DNA methylation profiles for the tumor origin detection

Gene selection for cancer classification using a hybrid of univariate and multivariate feature selection methods

Cancer classification from the gene expression profiles by Discriminant Kernel-PLS.