A Hybrid Gene Selection Method for Cancer Classification Based on Clustering Algorithm and Euclidean Distance

Ang Yang,Tao Cao,Renfa Li,Bo Liao
DOI: https://doi.org/10.1166/jctn.2012.2069
2012-01-01
Journal of Computational and Theoretical Nanoscience
Abstract:Classification for gene expression data is an important research filed in bioinformatics. Gene chip can massively detect the expression of thousands of genes in one experiment, and it has a very important practical significance for cancer classification and diagnosis. However, the gene expression data has many characteristics such as high-throughput, high dimensional, nonlinear, high noise and uneven distribution, which make it difficult to be processed. It is difficult to find the amount of feature genes, which have classification capability and minimum redundancy from the gene expression profile, and also play a key role in cancer diagnosis and research of pathogenesis mechanism. In this paper, we propose a new hybrid gene selection method using clustering algorithm. We first use filter method to rank the genes in terms of their expression difference, and then a clustering method is used in clustering gene expression data. Afterwards we select and substitute combinative genes according to Euclidean distance. The characteristics of gene selection methods are validated using the leukemia data set. The experimental results demonstrate the effectiveness of our method in addressing the problem.
What problem does this paper attempt to address?