GINT: A Generative Interpretability Method Via Perturbation in the Latent Space

Caizhi Tang,Qing Cui,Longfei Li,Jun Zhou
DOI: https://doi.org/10.1016/j.eswa.2023.120570
IF: 8.5
2023-01-01
Expert Systems with Applications
Abstract:As the neural networks get deeper and deeper, model interpretation becomes more necessary and important, especially in high-risk fields such as medicine and finance. From the perspective of feature attribution, most existing attempts aim to identify relevant features contributing the most to the prediction. Among them, perturbation-based methods mainly explore the corresponding model’s output by randomly perturbing the given features. When the data is high-dimensional and sparse, perturbing the feature space may be inefficient and meaningless since it ignores the feature correlations in the data distribution. In this paper, we introduce a novel Generative INTerpretability method, named GINT, which generates perturbations in the latent space. We propose a unified framework for perturbation-based methods, which describes the characteristics of a suitable perturbation for interpretation. Under the framework, we adopt a generative model to generate perturbation instead of randomly perturbing. Subsequently, we sample perturbations from the generative model for a given instance and its prediction and calculate the feature importance by analyzing those perturbations. We conduct extensive experiments to validate the effectiveness and efficiency of the proposed method.
What problem does this paper attempt to address?