Abstract:With the growing demand for interpretable deep learning models, this paper introduces Integrative CAM, an advanced Class Activation Mapping (CAM) technique aimed at providing a holistic view of feature importance across Convolutional Neural Networks (CNNs). Traditional gradient-based CAM methods, such as Grad-CAM and Grad-CAM++, primarily use final layer activations to highlight regions of interest, often neglecting critical features derived from intermediate layers. Integrative CAM addresses this limitation by fusing insights across all network layers, leveraging both gradient and activation scores to adaptively weight layer contributions, thus yielding a comprehensive interpretation of the model's internal representation. Our approach includes a novel bias term in the saliency map calculation, a factor frequently omitted in existing CAM techniques, but essential for capturing a more complete feature importance landscape, as modern CNNs rely on both weighted activations and biases to make predictions. Additionally, we generalize the alpha term from Grad-CAM++ to apply to any smooth function, expanding CAM applicability across a wider range of models. Through extensive experiments on diverse and complex datasets, Integrative CAM demonstrates superior fidelity in feature importance mapping, effectively enhancing interpretability for intricate fusion scenarios and complex decision-making tasks. By advancing interpretability methods to capture multi-layered model insights, Integrative CAM provides a valuable tool for fusion-driven applications, promoting the trustworthy and insightful deployment of deep learning models.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to improve the interpretability of convolutional neural networks (CNNs), especially in the case of complex and high - dimensional data. Specifically, the paper proposes Integrative CAM (I - CAM), an improved class activation mapping (CAM) technique, aiming to provide a comprehensive explanation of the internal representations of CNNs. ### Main problems: 1. **Limitations of existing CAM methods**: - Traditional gradient - based CAM methods (such as Grad - CAM and Grad - CAM++) mainly rely on the activations of the final layer to highlight the regions of interest, often ignoring the key features of the intermediate layers. - These methods usually only use the final convolutional layer to generate the class activation map (CAM), resulting in a low spatial resolution and being unable to capture more fine - grained features. - Existing methods lack guidance when choosing the layers for visualization, increasing the complexity and inconsistency of the interpretation process. 2. **Impact of ignoring the bias term**: - Existing CAM methods usually ignore the bias term, while modern CNN models rely not only on weighted activations but also on biases for prediction. Ignoring the bias term may lead to less accurate generated CAMs. 3. **Need for multi - layer fusion**: - In order to provide a more comprehensive explanation, a method that can fuse the information of all network layers is required to adapt to the differences in the importance of different layers and ensure the accuracy of the explanation. ### Solutions: - **Integrative CAM (I - CAM)**: - I - CAM provides a comprehensive explanation of the internal representations of the model by fusing the information of all network layers and adaptively weighting the contributions of each layer by combining gradients and activation scores. - A new bias term is introduced into the saliency map calculation to capture a more complete feature importance landscape. - The generality of the alpha term is improved, making it applicable to any smoothing function, expanding the application of CAM in a wider range of models. ### Formula summary: - The final classification score \( \hat{y}_c \) includes the bias term: \[ \hat{y}_c=\sum_{k} w_k\left(\sum_{i,j} A^{l}_{ij}\right)+b_c \] where \( w_k \) is the weight of each feature map, \( A^{l}_{ij} \) is the feature map of the \( l \) - th layer, and \( b_c \) is the bias term. - The formula for calculating the class activation map \( M_c \) of I - CAM: \[ M_c = \sum_{l \in L'} \alpha_l M^l_c \] where \( L' \) is the set of layers screened according to the importance scores, \( \alpha_l \) is the weight of the \( l \) - th layer, and \( M^l_c \) is the CAM of the \( l \) - th layer. Through these improvements, I - CAM can provide more accurate and comprehensive feature importance mapping in complex CNN structures, thereby enhancing the interpretability and credibility of the model.

Integrative CAM: Adaptive Layer Fusion for Comprehensive Interpretation of CNNs

Exclusive Feature Constrained Class Activation Mapping for Better Visual Explanation.

Statistic-CAM: A Gradient-Free Visual Explanations for Deep Convolutional Network

UnionCAM: enhancing CNN interpretability through denoising, weighted fusion, and selective high-quality class activation mapping

Integrated Grad-CAM: Sensitivity-Aware Visual Explanation of Deep Convolutional Networks via Integrated Gradient-Based Scoring

DecomCAM: Advancing Beyond Saliency Maps through Decomposition and Integration

Feature Activation Map: Visual Explanation of Deep Learning Models for Image Classification

Grad++ScoreCAM: Enhancing Visual Explanations of Deep Convolutional Networks Using Incremented Gradient and Score- Weighted Methods

FD-CAM: Improving Faithfulness and Discriminability of Visual Explanation for CNNs

Grad-CAM: Why did you say that?

Master-CAM: Multi-scale fusion guided by Master map for high-quality class activation maps

Generalizing GradCAM for Embedding Networks

Feature CAM: Interpretable AI in Image Classification

CAManim: Animating end-to-end network activation maps

Cluster-CAM: Cluster-weighted visual interpretation of CNNs' decision in image classification

Axiom-based Grad-CAM: Towards Accurate Visualization and Explanation of CNNs

Adapting Grad-CAM for Embedding Networks

CAM-loss: Towards Learning Spatially Discriminative Feature Representations

CR-CAM: Generating explanations for deep neural networks by contrasting and ranking features

LFI-CAM: Learning Feature Importance for Better Visual Explanation

G-CAM: Graph Convolution Network Based Class Activation Mapping for Multi-label Image Recognition.