Informative gene selection and tumor classification by null space LDA for microarray data

Feng Yue,Kuanquan Wang,Wangmeng Zuo
DOI: https://doi.org/10.1007/978-3-540-74450-4_39
2007-01-01
Abstract:DNA microarray technology can monitor thousands of genes in a single experiment. One important application of this high-throughput gene expression data is to classify samples into known categories. Since the number of gene often exceeds the number of samples, classical classification methods do not work well under this circumstance. Furthermore, there are many irrelevant and redundant genes which will decrease classification accuracy, thus a gene selection process is necessary. More accurate classification result using these selected genes is expected. A novel informative gene selection and sample classification method for gene expression data is proposed in this paper. This method is based on Linear Discriminant Analysis (LDA) in the regular space and the null space of within-class scatter matrix. By recursively filtering genes which have smaller coefficient in the optimal projection basis vectors, the remaining genes are more and more informative. The results of experiments on leukemia dataset and the colon dataset show that genes in this subset have much less correlations and more discriminative power compared to those selected by classical methods.
What problem does this paper attempt to address?