Gene Selection Algorithm Based on Correlation Analysis

王明怡,吴平,王德林
DOI: https://doi.org/10.3785/j.issn.1008-973x.2004.10.012
2004-01-01
Abstract:Gene sets of interest typically selected by usual ranking methods from microarray data will contain many highly correlated genes. This situation will degrade the performance of classifiers. To filter these redundant genes (features), an unsupervised feature selection algorithm was proposed. The task of the algorithm involves partitioning the original feature set into a number of homogeneous subsets (clusters) and selecting a representative feature from each such cluster. Partitioning of the features is done based on k-NN (k nearest neighbor) principles using pairwise feature correlation measures. This method does not need to specify the optimal number of clusters in advance and has less computational complexity. Real biological data experiments show that this algorithm significantly increases the classification (accuracy) of existing classifiers.
What problem does this paper attempt to address?