Concept-Based Explanations in Computer Vision: Where Are We and Where Could We Go?

Jae Hee Lee,Georgii Mikriukov,Gesina Schwalbe,Stefan Wermter,Diedrich Wolter
2024-09-20
Abstract:Concept-based XAI (C-XAI) approaches to explaining neural vision models are a promising field of research, since explanations that refer to concepts (i.e., semantically meaningful parts in an image) are intuitive to understand and go beyond saliency-based techniques that only reveal relevant regions. Given the remarkable progress in this field in recent years, it is time for the community to take a critical look at the advances and trends. Consequently, this paper reviews C-XAI methods to identify interesting and underexplored areas and proposes future research directions. To this end, we consider three main directions: the choice of concepts to explain, the choice of concept representation, and how we can control concepts. For the latter, we propose techniques and draw inspiration from the field of knowledge representation and learning, showing how this could enrich future C-XAI research.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the limitations of current concept - based explainable artificial intelligence (C - XAI) methods in the field of computer vision. Specifically, the authors focus on the following issues: 1. **Limitations of existing C - XAI methods**: Existing C - XAI methods mainly focus on the attributes of static images and image regions, while ignoring the concept extraction of temporal features and other sensory features. In addition, most of these methods are applied to convolutional neural networks (CNNs), while there is less research on other architectures such as Vision Transformers (ViTs) and multimodal models. 2. **Lack of diversity in concept types**: At present, most C - XAI research is limited to extracting static attributes from CNNs, ignoring dynamic features (such as motion patterns in videos) and concepts in multimodal data. This limits the potential of C - XAI in complex application scenarios. 3. **Limitations of concept representation**: Most existing concept representation methods are based on vector representation, which is too simple to capture complex concept relationships and distributions. For example, the point - estimates representation method cannot handle the concept overlap and sub - concept problems in large - scale models well. 4. **Insufficient exploration of concept control mechanisms**: Existing C - XAI research pays less attention to how to verify and control the concepts and their relationships inside the model through knowledge representation and reasoning, which is crucial for ensuring the reliability and controllability of the model. To solve these problems, the authors propose the following research directions: - **Expand concept types**: Explore how to extract concepts from temporal features, multimodal data, and new architectures such as ViTs. - **Improve concept representation methods**: Develop more complex and flexible concept representation methods, such as region - or distribution - based concept representation, to better capture the diversity and complexity of concepts. - **Introduce knowledge representation and reasoning**: Draw on the research results in the field of knowledge representation and reasoning to verify and control the concepts and their relationships inside the model, ensuring the correctness and reliability of the model. - **Explore concept control mechanisms**: Study how to effectively control concepts through knowledge representation and learning methods, thereby improving the controllability and interpretability of the model. Through these research directions, the authors hope to promote the development of the C - XAI field, enabling it to better explain and control deep - learning models, especially in the field of computer vision.