Abstract:Abstract Feature selection is usually a separate procedure which can not beneﬁt from result of the data exploration. In this paper,we propose a unsupervised feature selection method which could reuse a speciﬁc data exploration result. Furthermore, ouralgorithm follows the idea of clustering attributes and combines two state-of-the-art data analyzing methods, that’s maximalinformation coeﬃcient and aﬃnity propagation. Classiﬁcation problems with diﬀerent classiﬁers were tested to validation ourmethod and others. Data experiments result exhibits our unsupervised algorithm is comparable with classical feature selectionmethods and even outperforms some supervised learning algorithms. Data simulation with one credit dataset of our own froma bank of China shows the capability of our method for real world application. Keywords: feature selection, feature clustering, maximal information coeﬃcient, aﬃnity propagation 1. IntroductionData mining shows powerful capability for automatically identifying valuable and potential information fromdata,solotsofareahavebeenproﬁtfromit,suchasexpertsystem,decisionsupportandﬁnancialforecast[1]. Dueto the widespread use of the Internet and the emergence of bioinformatics, the dimensionality of dataset becomelarger and larger. Datasets with hundreds and thousands of attributes may cause the “curse of dimensionality”problem. Furthermore, some of traditional classiﬁcation and clustering algorithms can not work properly. Oneof the most feasible technique to cope with this problem is feature reduction. Feature reduction refers to theresearch of methods which have the reduced dimensions present the original data[2]. In general point of view,there are two categories of feature reduction, namely feature selection(or variable selection), feature extraction(orfeature transform). The former one tries to construct a new feature space by transforming the original featurespace into lower dimensional ones such as PCA and LLE which have been given broad appeal[3]. However, thetransformation result from feature extraction is quite diﬃcult to interpret and explain. This drawback limits theuse of this kind of means in some area. The latter type, by contrast, does not make any transformation, but ﬁltersout some meaningless attributes from original data. In other words, this category of processes chooses a subsetfrom the original feature space.

Unsupervised Spectral Feature Selection Algorithms for High Dimensional Data

$$\Hbox {u}^2\hbox {f}^2\hbox {S}^2$$ U 2 F 2 S 2 : Uncovering Feature-level Similarities for Unsupervised Feature Selection.

U^2F^2S^2 : Uncovering Feature-level Similarities for Unsupervised Feature Selection

The Unsupervised Feature Selection Algorithms Based on Standard Deviation and Cosine Similarity for Genomic Data Analysis

Clustering-Guided Sparse Structural Learning for Unsupervised Feature Selection

Unsupervised Feature Selection Algorithm Based on Dual Manifold Re-ranking

Spectral Self-supervised Feature Selection

Unsupervised Feature Selection Using Nonnegative Spectral Analysis.

A New Unsupervised Feature Selection Algorithm Using Similarity-Based Feature Clustering.

Classification of High-dimensional Time Series in Spectral Domain using Explainable Features

Feature Selection Using Hierarchical Feature Clustering

Unsupervised feature selection for multi-cluster data

Unsupervised feature selection via discrete spectral clustering and feature weights

Subspace Learning for Unsupervised Feature Selection Via Matrix Factorization.

Information Technology and Quantitative Management , ITQM 2013

Discovering a sparse set of pairwise discriminating features in high dimensional data

Low-rank Unsupervised Graph Feature Selection Via Feature Self-Representation.

Unsupervised Feature Analysis with Class Margin Optimization

Unsupervised spectral mapping and feature selection for hyperspectral anomaly detection

A Novel Feature Selection Method Based on MRMR and Enhanced Flower Pollination Algorithm for High Dimensional Biomedical Data

Unsupervised Feature Selection Algorithm Based on Sparse Representation