Consistent Feature Selection for Analytic Deep Neural Networks

Vu Dinh,Lam Si Tung Ho
DOI: https://doi.org/10.48550/arXiv.2010.08097
2020-10-16
Abstract:One of the most important steps toward interpretability and explainability of neural network models is feature selection, which aims to identify the subset of relevant features. Theoretical results in the field have mostly focused on the prediction aspect of the problem with virtually no work on feature selection consistency for deep neural networks due to the model's severe nonlinearity and unidentifiability. This lack of theoretical foundation casts doubt on the applicability of deep learning to contexts where correct interpretations of the features play a central role. In this work, we investigate the problem of feature selection for analytic deep networks. We prove that for a wide class of networks, including deep feed-forward neural networks, convolutional neural networks, and a major sub-class of residual neural networks, the Adaptive Group Lasso selection procedure with Group Lasso as the base estimator is selection-consistent. The work provides further evidence that Group Lasso might be inefficient for feature selection with neural networks and advocates the use of Adaptive Group Lasso over the popular Group Lasso.
Machine Learning,Statistics Theory
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address the issue of feature selection consistency in deep neural networks (DNNs). Specifically, it focuses on how to identify subsets of features related to the output in deep neural networks to improve the model's interpretability and explainability. Although deep neural networks perform excellently in terms of predictive performance, due to their highly nonlinear and unidentifiable nature, existing theoretical research mainly focuses on prediction, with very limited studies on feature selection consistency. ### Background and Motivation 1. **Interpretability and Explainability**: Deep neural networks are often viewed as "black box" models, lacking transparency in the prediction process. This is a significant obstacle in important application areas that require correct interpretation of features, such as medical and engineering sciences. 2. **Limitations of Existing Methods**: While there are some regularization-based feature selection methods (such as Group Lasso), the theoretical properties of these methods have not been fully studied, especially in the context of deep networks. 3. **Lack of Theoretical Foundation**: Current theoretical results on feature selection consistency mainly focus on shallow networks or specific posterior distributions, lacking systematic research on deep networks. ### Main Contributions 1. **Theoretical Proof**: The paper proves that for a large class of analytic deep neural networks (including major subclasses of feedforward neural networks, convolutional neural networks, and residual neural networks), using Group Lasso as the base estimator, the Adaptive Group Lasso selection procedure is feature selection consistent. 2. **Method Comparison**: Through theoretical and experimental validation, the paper shows that Adaptive Group Lasso outperforms traditional Group Lasso in feature selection. 3. **Practical Application**: The paper demonstrates the superior performance of Adaptive Group Lasso in feature selection through simulation experiments and real datasets (such as the Boston housing dataset). ### Conclusion This paper provides a solid theoretical foundation for feature selection in deep neural networks, proving the effectiveness of Adaptive Group Lasso in feature selection consistency. This achievement not only fills the gap in theoretical research but also provides new tools and methods for deep learning in application areas that require correct interpretation of features.