CEIR: Concept-based Explainable Image Representation Learning

Yan Cui,Shuhong Liu,Liuzhuozheng Li,Zhiyuan Yuan
2023-12-17
Abstract:In modern machine learning, the trend of harnessing self-supervised learning to derive high-quality representations without label dependency has garnered significant attention. However, the absence of label information, coupled with the inherently high-dimensional nature, improves the difficulty for the interpretation of learned representations. Consequently, indirect evaluations become the popular metric for evaluating the quality of these features, leading to a biased validation of the learned representation rationale. To address these challenges, we introduce a novel approach termed Concept-based Explainable Image Representation (CEIR). Initially, using the Concept-based Model (CBM) incorporated with pretrained CLIP and concepts generated by GPT-4, we project input images into a concept vector space. Subsequently, a Variational Autoencoder (VAE) learns the latent representation from these projected concepts, which serves as the final image representation. Due to the capability of the representation to encapsulate high-level, semantically relevant concepts, the model allows for attributions to a human-comprehensible concept space. This not only enhances interpretability but also preserves the robustness essential for downstream tasks. For instance, our method exhibits state-of-the-art unsupervised clustering performance on benchmarks such as CIFAR10, CIFAR100, and STL10. Furthermore, capitalizing on the universality of human conceptual understanding, CEIR can seamlessly extract the related concept from open-world images without fine-tuning. This offers a fresh approach to automatic label generation and label manipulation.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The paper aims to address the issue of interpretability of image representations in self-supervised learning. Specifically: 1. **Interpretability Issue**: In modern machine learning, the trend of using self-supervised learning to generate high-quality representations is becoming increasingly significant. However, the lack of label information combined with the high-dimensional feature space makes the learned representations difficult to interpret. As a result, indirect evaluation methods have become mainstream, which may lead to biased validation of the reasonableness of the learned representations. 2. **Proposed Solution**: To tackle these challenges, the authors introduce a new method called "Concept-based Explainable Image Representation (CEIR)." By combining the pre-trained CLIP model and concepts generated by GPT-4, the input images are mapped to a concept vector space. A Variational Autoencoder (VAE) is then used to learn latent representations from these projected concepts, serving as the final image representation. This approach not only enhances the interpretability of the representations but also retains the robustness required for downstream tasks. 3. **Experimental Results**: On benchmark datasets such as CIFAR10, CIFAR100, and STL10, CEIR demonstrates state-of-the-art unsupervised clustering performance. It can seamlessly extract relevant concepts from open-world images without fine-tuning, providing a new method for automatic label generation and manipulation.