CEIR: Concept-based Explainable Image Representation Learning

Yan Cui,Shuhong Liu,Liuzhuozheng Li,Zhiyuan Yuan

2023-12-17

Abstract:In modern machine learning, the trend of harnessing self-supervised learning to derive high-quality representations without label dependency has garnered significant attention. However, the absence of label information, coupled with the inherently high-dimensional nature, improves the difficulty for the interpretation of learned representations. Consequently, indirect evaluations become the popular metric for evaluating the quality of these features, leading to a biased validation of the learned representation rationale. To address these challenges, we introduce a novel approach termed Concept-based Explainable Image Representation (CEIR). Initially, using the Concept-based Model (CBM) incorporated with pretrained CLIP and concepts generated by GPT-4, we project input images into a concept vector space. Subsequently, a Variational Autoencoder (VAE) learns the latent representation from these projected concepts, which serves as the final image representation. Due to the capability of the representation to encapsulate high-level, semantically relevant concepts, the model allows for attributions to a human-comprehensible concept space. This not only enhances interpretability but also preserves the robustness essential for downstream tasks. For instance, our method exhibits state-of-the-art unsupervised clustering performance on benchmarks such as CIFAR10, CIFAR100, and STL10. Furthermore, capitalizing on the universality of human conceptual understanding, CEIR can seamlessly extract the related concept from open-world images without fine-tuning. This offers a fresh approach to automatic label generation and label manipulation.

Computer Vision and Pattern Recognition,Machine Learning

What problem does this paper attempt to address?

The paper aims to address the issue of interpretability of image representations in self-supervised learning. Specifically: 1. **Interpretability Issue**: In modern machine learning, the trend of using self-supervised learning to generate high-quality representations is becoming increasingly significant. However, the lack of label information combined with the high-dimensional feature space makes the learned representations difficult to interpret. As a result, indirect evaluation methods have become mainstream, which may lead to biased validation of the reasonableness of the learned representations. 2. **Proposed Solution**: To tackle these challenges, the authors introduce a new method called "Concept-based Explainable Image Representation (CEIR)." By combining the pre-trained CLIP model and concepts generated by GPT-4, the input images are mapped to a concept vector space. A Variational Autoencoder (VAE) is then used to learn latent representations from these projected concepts, serving as the final image representation. This approach not only enhances the interpretability of the representations but also retains the robustness required for downstream tasks. 3. **Experimental Results**: On benchmark datasets such as CIFAR10, CIFAR100, and STL10, CEIR demonstrates state-of-the-art unsupervised clustering performance. It can seamlessly extract relevant concepts from open-world images without fine-tuning, providing a new method for automatic label generation and manipulation.

CEIR: Concept-based Explainable Image Representation Learning

Conceptual Learning via Embedding Approximations for Reinforcing Interpretability and Transparency

Explaining Classifiers with Causal Concept Effect (CaCE)

Evidential Concept Embedding Models: Towards Reliable Concept Explanations for Skin Disease Diagnosis

I-CEE: Tailoring Explanations of Image Classification Models to User Expertise

CEIL: Generalized Contextual Imitation Learning

Language-Informed Visual Concept Learning

Curious Representation Learning for Embodied Intelligence

Learning Socially Embedded Visual Representation from Scratch

Learning Bottleneck Concepts in Image Classification

Accurate Explanation Model for Image Classifiers using Class Association Embedding

What Does a Model Really Look at?: Extracting Model-Oriented Concepts for Explaining Deep Neural Networks

Explaining Explainability: Understanding Concept Activation Vectors

General Knowledge Embedded Image Representation Learning

Concept Visualization: Explaining the CLIP Multi-modal Embedding Using WordNet

Explain via Any Concept: Concept Bottleneck Model with Open Vocabulary Concepts

Spatial-temporal Concept Based Explanation of 3D ConvNets.

An inherently interpretable deep learning model for local explanations using visual concepts

Explainable Image Recognition via Enhanced Slot-attention Based Classifier

Exploring Visual Engagement Signals for Representation Learning