Abstract:Feature selection, as a data preprocessing strategy, has been proven to be effective and efficient in preparing data (especially high-dimensional data) for various data mining and machine learning problems. The objectives of feature selection include: building simpler and more comprehensible models, improving data mining performance, and preparing clean, understandable data. The recent proliferation of big data has presented some substantial challenges and opportunities to feature selection. In this survey, we provide a comprehensive and structured overview of recent advances in feature selection research. Motivated by current challenges and opportunities in the era of big data, we revisit feature selection research from a data perspective and review representative feature selection algorithms for conventional data, structured data, heterogeneous data and streaming data. Methodologically, to emphasize the differences and similarities of most existing feature selection algorithms for conventional data, we categorize them into four main groups: similarity based, information theoretical based, sparse learning based and statistical based methods. To facilitate and promote the research in this community, we also present an open-source feature selection repository that consists of most of the popular feature selection algorithms (\url{http://featureselection.asu.edu/}). Also, we use it as an example to show how to evaluate feature selection algorithms. At the end of the survey, we present a discussion about some open problems and challenges that require more attention in future research.

Feature Selection Based on Dependency Margin

A Constrained Feature Selection Approach Based on Feature Clustering and Hypothesis Margin Maximization

Unsupervised Feature Analysis with Class Margin Optimization

Feature Selection for Monotonic Classification Via Maximizing Monotonic Dependency

Large-margin Feature Selection for Monotonic Classification

Dependence Guided Unsupervised Feature Selection

Efficient Leave-One-out Strategy for Supervised Feature Selection

A Feature Selection Method Based on Feature Grouping and Genetic Algorithm

Neurodynamics-driven supervised feature selection.

Feature Selection Based on a New Dependency Measure

An Optimal Feature Subset Selection Method Based On Distance Discriminant And Distribution Overlapping

Maximum margin and global criterion based-recursive feature selection

A Feature Selection Framework Based on Supervised Data Clustering

Feature Selection with Integrated Relevance and Redundancy Optimization

An Adaptive Feature Selection Method for Multi-Class Classification.

A Novel Margin Based Algorithm for Feature Extraction

Margin-maximizing feature elimination methods for linear and nonlinear kernel-based discriminant functions

Invariant optimal feature selection: A distance discriminant and feature ranking based solution

Semi-supervised feature selection based on discernibility matrix and mutual information

Effective Learning with Joint Discriminative and Representative Feature Selection

Feature Selection: A Data Perspective