Consistent Feature Selection for Analytic Deep Neural Networks

Vu Dinh,Lam Si Tung Ho

DOI: https://doi.org/10.48550/arXiv.2010.08097

2020-10-16

Abstract:One of the most important steps toward interpretability and explainability of neural network models is feature selection, which aims to identify the subset of relevant features. Theoretical results in the field have mostly focused on the prediction aspect of the problem with virtually no work on feature selection consistency for deep neural networks due to the model's severe nonlinearity and unidentifiability. This lack of theoretical foundation casts doubt on the applicability of deep learning to contexts where correct interpretations of the features play a central role. In this work, we investigate the problem of feature selection for analytic deep networks. We prove that for a wide class of networks, including deep feed-forward neural networks, convolutional neural networks, and a major sub-class of residual neural networks, the Adaptive Group Lasso selection procedure with Group Lasso as the base estimator is selection-consistent. The work provides further evidence that Group Lasso might be inefficient for feature selection with neural networks and advocates the use of Adaptive Group Lasso over the popular Group Lasso.

Machine Learning,Statistics Theory

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper aims to address the issue of feature selection consistency in deep neural networks (DNNs). Specifically, it focuses on how to identify subsets of features related to the output in deep neural networks to improve the model's interpretability and explainability. Although deep neural networks perform excellently in terms of predictive performance, due to their highly nonlinear and unidentifiable nature, existing theoretical research mainly focuses on prediction, with very limited studies on feature selection consistency. ### Background and Motivation 1. **Interpretability and Explainability**: Deep neural networks are often viewed as "black box" models, lacking transparency in the prediction process. This is a significant obstacle in important application areas that require correct interpretation of features, such as medical and engineering sciences. 2. **Limitations of Existing Methods**: While there are some regularization-based feature selection methods (such as Group Lasso), the theoretical properties of these methods have not been fully studied, especially in the context of deep networks. 3. **Lack of Theoretical Foundation**: Current theoretical results on feature selection consistency mainly focus on shallow networks or specific posterior distributions, lacking systematic research on deep networks. ### Main Contributions 1. **Theoretical Proof**: The paper proves that for a large class of analytic deep neural networks (including major subclasses of feedforward neural networks, convolutional neural networks, and residual neural networks), using Group Lasso as the base estimator, the Adaptive Group Lasso selection procedure is feature selection consistent. 2. **Method Comparison**: Through theoretical and experimental validation, the paper shows that Adaptive Group Lasso outperforms traditional Group Lasso in feature selection. 3. **Practical Application**: The paper demonstrates the superior performance of Adaptive Group Lasso in feature selection through simulation experiments and real datasets (such as the Boston housing dataset). ### Conclusion This paper provides a solid theoretical foundation for feature selection in deep neural networks, proving the effectiveness of Adaptive Group Lasso in feature selection consistency. This achievement not only fills the gap in theoretical research but also provides new tools and methods for deep learning in application areas that require correct interpretation of features.

Consistent Feature Selection for Analytic Deep Neural Networks

Sparse-Input Neural Network using Group Concave Regularization

Non-linear Feature Selection Based on Convolution Neural Networks with Sparse Regularization

Stable Feature Selection from Brain Smri

DeepPINK: reproducible feature selection in deep neural networks

LassoLayer: Nonlinear Feature Selection by Switching One-to-one Links

LCEN: A Novel Feature Selection Algorithm for Nonlinear, Interpretable Machine Learning Models

Feature Analysis Network: An Interpretable Idea in Deep Learning

Consistent group selection in high-dimensional linear regression

Convolution Neural Network Feature Importance Analysis and Feature Selection Enhanced Model

Optimal Feature Selection for Sparse Linear Discriminant Analysis and Its Applications in Gene Expression Data

Deep PLS: A Lightweight Deep Learning Model for Interpretable and Efficient Data Analytics

AFS: An Attention-based mechanism for Supervised Feature Selection

Unveiling the Power of Sparse Neural Networks for Feature Selection

Adaptive Feature Selection With Augmented Attributes

Error Controlled Feature Selection for Ultrahigh Dimensional and Highly Correlated Feature Space Using Deep Learning

Towards Interpretable Deep Neural Networks by Leveraging Adversarial Examples

Human-in-the-Loop Feature Selection Using Interpretable Kolmogorov-Arnold Network-based Double Deep Q-Network

Group-Feature (Sensor) Selection With Controlled Redundancy Using Neural Networks

Interpret Neural Networks by Extracting Critical Subnetworks

The Contextual Lasso: Sparse Linear Models via Deep Neural Networks