Stable local interpretable model-agnostic explanations based on a variational autoencoder

Xu Xiang,Hong Yu,Ye Wang,Guoyin Wang
DOI: https://doi.org/10.1007/s10489-023-04942-5
IF: 5.3
2023-09-26
Applied Intelligence
Abstract:For humans to trust in artificial intelligence (AI) systems, it is essential for machine learning (ML) models to be interpretable to users. For example, the judicial process requires that AI conclusions must be rigorous and absolutely interpretable. In this paper, we propose a novel approach, VAE-SLIME, for providing stable local interpretable model-agnostic explanations (SLIME) based on a variational autoencoder (VAE). LIME is a technique that explains the predictions of any classifier in an interpretable and faithful manner. Despite the great success of LIME, the most popular method in this category, it has several disadvantages due to its random perturbation-based sampling method. The VAE-SLIME proposed in this paper is specifically designed to address the lack of stability and local fidelity exhibited by LIME for tabular data. VAE-SLIME first employs fixed noise to replace the random Gaussian noise used by the reparameterization trick of the VAE. Then, it uses this new VAE model instead of random perturbation method to generate stable samples. By considering the sequential relationship and flipping of features, a novel explanation stability evaluation metric, the feature sequence stability index (FSSI), is introduced to accurately evaluate the stability of explanations. In a comparison with 6 state-of-the-art approaches on 7 commonly used tabular datasets, the experimental results show beyond doubt that the explanations produced by our approach are most stable, and its local fidelity is 65.17% higher than that of other approaches on average.
computer science, artificial intelligence
What problem does this paper attempt to address?