Improving Visual Counterfactual Explanation Models for Image Classification via CLIP

M. Haseyama,Ren Togo,Takahiro Ogawa,Keisuke Maeda,Xiang Li
DOI: https://doi.org/10.1109/GCCE59613.2023.10315277
2023-10-10
Abstract:Deep learning models have achieved remarkable success in the field of computer vision. However, improving the quality of visual counterfactual results continues to be a significant challenge. Visual counterfactual explanation, a task that highlights image regions that need alterations to reclassify them into a different category, allows for explanations that are more intuitively understandable to humans. In this paper, we propose a method that introduces Contrastive Language-overview Image Pretraining (CLIP) as an auxiliary model to obtain a better feature pair of the query class and the distractor class, leading to more accurate visual counterfactual explanations. Experimental results on CUB-200-2011 Dataset demonstrate that our method yields a 3% improvement in Near-KP and a 0.1 increase in the "number of edits" metric when generating explanations, outperforming existing state-of-the-art methods in the image classification task.
Computer Science
What problem does this paper attempt to address?