Supervised feature selection method via potential value estimation

Long Zhao,LinFeng Jiang,XiangJun Dong
DOI: https://doi.org/10.1007/s10586-016-0635-0
2016-09-17
Cluster Computing
Abstract:Feature selection is an important step dealing with high dimensional data. In order to select categories related features, the importance of feature need to be measured. The existing importance measure algorithms can’t reflect different distributions of data space and have poor interpretabilities. In this paper, a new feature weight calculation method via potential value estimation is proposed. The potential values indicate different data distributions in different dimensions. The quality of data points is another parameter needed to calculate the potential value of the data points in data field. The quality of the data points is related to the density and the type of the surrounding points. At the same time, the extraction of important features should not only consider the distribution of the feature itself but also consider the correlation with other features or categories. This method adopts the Sw$$S_{w}$$ (potential value within class) and Sb$$S_{b} $$(potential value between different classes) to calculate the information entropy of each feature. The representative features have been selected to structure classifier. In order to accelerate the speed of operation, different grids are divided with different dimensions. By estimating the potential value of different data points on the same dimension, the correlation between feature and label is evaluated. After a series of analysis and experiments, the proposed method has been proved has overall classification accuracy with the fewest features. The effect of dimensionality reduction is significantly higher than FRGDF and the other manual information methods.
What problem does this paper attempt to address?