Interpretability as Approximation: Understanding Black-Box Models by Decision Boundary

Hangcheng Dong,Bingguo Liu,Dong Ye,Guodong LiuÂ
DOI: https://doi.org/10.3390/electronics13224339
IF: 2.9
2024-11-06
Electronics
Abstract:Currently, interpretability methods focus more on less objective human-understandable semantics. To objectify and standardize interpretability research, in this study, we provide notions of interpretability based on approximation theory. We first define explainable models in terms of explicitness and then use completeness to define interpretability, thereby turning interpretability into the process of approximating black-box models with interpretable models. In particular, we think that the decision boundary of a classification model is equivalent to its interpretability. Next, we implement this approximation interpretation on multilayer perceptrons (MLPs) and then propose to use the MLP as a universal interpreter to explain other complex black-box models. Compared to the LIME method, which can only extract local linear features, our method is global and therefore termed as GIME. Extensive experiments demonstrate the effectiveness of our approaches.
engineering, electrical & electronic,computer science, information systems,physics, applied
What problem does this paper attempt to address?