Rule extraction for tumor/normal tissue classification based on microarray data

Li Ying-Xin,Jiang Yuan,Zhou Zhi-Hua
DOI: https://doi.org/10.3321/j.issn:0469-5097.2009.05.007
2009-01-01
Abstract:Classification rule extraction is an important technique for acquiring knowledge from data in the fields of machine learning and data mining.DNA microarray technology can monitor the expression patterns of thousands of genes simultaneously in a single experiment,and thus provides a successful way to a comprehensive understanding of the genetic alterations presented in tumors.Extracting rules from microarray data for distinguishing tumor tissue samples from normal ones can provide useful information to understand the underlying nature of carcinogenesis,and it also benefits the gene diagnosis of tumor.This work addresses the problem of extracting tumor/normal classification rules from broad patterns of gene expression profiles by employing a two-step strategy.The first step employed a feature selection method to remove the genes irrelevant to the tissue categories.In order to obtain accurate weights of genes for classification,a feature selection algorithm,RFE-Relief,was proposed based on the Relief algorithm and the strategy of 'Recursive Feature Elimination'.Multiple candidate gene subsets were generated.We used support vector machine as classifier to evaluate the classification abilities of these gene subsets by performing a cross-validation procedure on the training set,and selected the gene subset with the best classification performance as the feature subset for distinguishing Tumor/Normal tissue samples.The second step performed the CART algorithm to build a decision tree based on the expressions of genes of the feature subset,and then a prune algorithm was employed to obtain a reduced tree with improved generalization performance.We applied our method on a dataset containing multiple tumor tissues as well as their normal counterparts to extract rules for making accurate tissue classification.A set of rules represented by a decision tree for distinguishing tumor tissues from normal ones were obtained.We evaluated these rules on an independent test set and the results showed the good classification performance of these rules.In the end of the paper,these classification rules were also analyzed in detail to explore their classification information.
What problem does this paper attempt to address?