Analyzing feature importance with neural-network-derived trees

Vieira-Manzanera, Ernesto
DOI: https://doi.org/10.1007/s00521-024-10811-0
2024-12-14
Neural Computing and Applications
Abstract:This research focuses on the exploitation of transformations allowing the derivation of tree-like models from pre-trained neural networks to enhance their explainability and interpretability. Building upon the latest works that find that training a neural network supposes finding a partition of the input space in which to train local linear models, we use an analytical approach on these models and the decision boundaries in classification problems to study obtain an importance measure of each feature. A comparative analysis across models trained on diverse datasets, where problem explainability is relevant aims to derive an equivalent representation between a black-box model and a white-box one. From this new representation, a statistic-based methodology is proposed to determine the relevance of the input features in the problem, thereby gaining interpretability of the model at hand.
computer science, artificial intelligence
What problem does this paper attempt to address?