Information Technology and Quantitative Management , ITQM 2013

Xi Zhao,Wei Deng,Yong Shi
2013-01-01
Abstract:Abstract Feature selection is usually a separate procedure which can not benefit from result of the data exploration. In this paper,we propose a unsupervised feature selection method which could reuse a specific data exploration result. Furthermore, ouralgorithm follows the idea of clustering attributes and combines two state-of-the-art data analyzing methods, that’s maximalinformation coefficient and affinity propagation. Classification problems with different classifiers were tested to validation ourmethod and others. Data experiments result exhibits our unsupervised algorithm is comparable with classical feature selectionmethods and even outperforms some supervised learning algorithms. Data simulation with one credit dataset of our own froma bank of China shows the capability of our method for real world application. Keywords: feature selection, feature clustering, maximal information coefficient, affinity propagation 1. IntroductionData mining shows powerful capability for automatically identifying valuable and potential information fromdata,solotsofareahavebeenprofitfromit,suchasexpertsystem,decisionsupportandfinancialforecast[1]. Dueto the widespread use of the Internet and the emergence of bioinformatics, the dimensionality of dataset becomelarger and larger. Datasets with hundreds and thousands of attributes may cause the “curse of dimensionality”problem. Furthermore, some of traditional classification and clustering algorithms can not work properly. Oneof the most feasible technique to cope with this problem is feature reduction. Feature reduction refers to theresearch of methods which have the reduced dimensions present the original data[2]. In general point of view,there are two categories of feature reduction, namely feature selection(or variable selection), feature extraction(orfeature transform). The former one tries to construct a new feature space by transforming the original featurespace into lower dimensional ones such as PCA and LLE which have been given broad appeal[3]. However, thetransformation result from feature extraction is quite difficult to interpret and explain. This drawback limits theuse of this kind of means in some area. The latter type, by contrast, does not make any transformation, but filtersout some meaningless attributes from original data. In other words, this category of processes chooses a subsetfrom the original feature space.
What problem does this paper attempt to address?