MANet: Mixed Attention Network for Visual Explanation

Jingjing Bai,Yoshinobu Kawahara

DOI: https://doi.org/10.1007/s00354-024-00252-7

2024-05-24

New Generation Computing

Abstract:Various visual explanation methods, such as CAM and Grad-CAM, have been proposed to visualize and interpret predictions made by CNNs. Recent efforts go beyond mere visual interpretability, aiming to enhance CNN performance through the utilization of these generated visual explanations. In this work, we propose MANet (Mixed Attention Network)—a network architecture that advances the stability of visual explanations through an adaptive feature refinement mechanism via a mixed attention module. Concurrently, the generated attention maps are harnessed to bolster network performance in image recognition tasks. Experimental findings underscore the efficacy of MANet, demonstrating improved visual stability and consistency. The proposed architecture not only surpasses baseline models in image classification and object detection tasks but also establishes a novel paradigm for synergizing visual interpretability and network performance enhancement.

computer science, theory & methods, hardware & architecture

What problem does this paper attempt to address?

The paper primarily aims to address two core issues: 1. **Improving the interpretability of Convolutional Neural Networks (CNNs)**: By proposing a new network architecture—Hybrid Attention Network (MANet), it aims to generate more stable and consistent visual explanations to help understand the prediction process of CNNs in image recognition tasks. 2. **Enhancing the performance of CNNs in image recognition tasks**: While improving interpretability, MANet also enhances the network's performance in tasks such as image classification and object detection by utilizing the generated attention maps. Specifically, the MANet proposed in the paper includes the following key contributions: - **Innovative Architecture**: A new architecture named MANet is proposed, which integrates a hybrid attention module for adaptive feature refinement, thereby improving the accuracy of image recognition and visual explanations. - **Validation of the Effectiveness of the Hybrid Attention Module**: The effectiveness of the proposed hybrid attention module is validated through comprehensive ablation studies, systematically demonstrating the impact of different architectural choices and design elements on the overall performance of MANet. - **Simultaneous Improvement in Performance and Interpretability**: The uniqueness of MANet lies in its ability to simultaneously enhance network performance and generate reliable, coherent, and consistent visual explanations. This dual achievement allows MANet to meet the demands for both accuracy and interpretability within a single framework. In summary, the paper aims to improve the interpretability of CNN models and enhance their performance in image recognition tasks by introducing MANet and its hybrid attention mechanism.

MANet: Mixed Attention Network for Visual Explanation

Exclusive Feature Constrained Class Activation Mapping for Better Visual Explanation.

UniVisNet: A Unified Visualization and Classification Network for accurate of from MRI

Attention Branch Network: Learning of Attention Mechanism for Visual Explanation

Statistic-CAM: A Gradient-Free Visual Explanations for Deep Convolutional Network

Multi-Attention-Network for Semantic Segmentation of Fine Resolution Remote Sensing Images

Improving Network Interpretability via Explanation Consistency Evaluation

Attention Backpropagation

Multiattention Network for Semantic Segmentation of Fine-Resolution Remote Sensing Images

Embedding deep networks into visual explanations

Where is the Model Looking At?--Concentrate and Explain the Network Attention

TAME: Attention Mechanism Based Feature Fusion for Generating Explanation Maps of Convolutional Neural Networks

ST-ABN: Visual Explanation Taking into Account Spatio-temporal Information for Video Recognition

MANet: a multi-level aggregation network for semantic segmentation of high-resolution remote sensing images

Group-CAM: Group Score-Weighted Visual Explanations for Deep Convolutional Networks

Knowledge-Aware Neuron Interpretation for Scene Classification

T-TAME: Trainable Attention Mechanism for Explaining Convolutional Networks and Vision Transformers

CMANet: Cross-Modality Attention Network for Indoor-Scene Semantic Segmentation

Research on image classification based on residual group multi-scale enhanced attention network

Automated Natural Language Explanation of Deep Visual Neurons with Large Models

Towards Visual Explanations for Convolutional Neural Networks via Input Resampling