MANet: Mixed Attention Network for Visual Explanation

Jingjing Bai,Yoshinobu Kawahara
DOI: https://doi.org/10.1007/s00354-024-00252-7
2024-05-24
New Generation Computing
Abstract:Various visual explanation methods, such as CAM and Grad-CAM, have been proposed to visualize and interpret predictions made by CNNs. Recent efforts go beyond mere visual interpretability, aiming to enhance CNN performance through the utilization of these generated visual explanations. In this work, we propose MANet (Mixed Attention Network)—a network architecture that advances the stability of visual explanations through an adaptive feature refinement mechanism via a mixed attention module. Concurrently, the generated attention maps are harnessed to bolster network performance in image recognition tasks. Experimental findings underscore the efficacy of MANet, demonstrating improved visual stability and consistency. The proposed architecture not only surpasses baseline models in image classification and object detection tasks but also establishes a novel paradigm for synergizing visual interpretability and network performance enhancement.
computer science, theory & methods, hardware & architecture
What problem does this paper attempt to address?
The paper primarily aims to address two core issues: 1. **Improving the interpretability of Convolutional Neural Networks (CNNs)**: By proposing a new network architecture—Hybrid Attention Network (MANet), it aims to generate more stable and consistent visual explanations to help understand the prediction process of CNNs in image recognition tasks. 2. **Enhancing the performance of CNNs in image recognition tasks**: While improving interpretability, MANet also enhances the network's performance in tasks such as image classification and object detection by utilizing the generated attention maps. Specifically, the MANet proposed in the paper includes the following key contributions: - **Innovative Architecture**: A new architecture named MANet is proposed, which integrates a hybrid attention module for adaptive feature refinement, thereby improving the accuracy of image recognition and visual explanations. - **Validation of the Effectiveness of the Hybrid Attention Module**: The effectiveness of the proposed hybrid attention module is validated through comprehensive ablation studies, systematically demonstrating the impact of different architectural choices and design elements on the overall performance of MANet. - **Simultaneous Improvement in Performance and Interpretability**: The uniqueness of MANet lies in its ability to simultaneously enhance network performance and generate reliable, coherent, and consistent visual explanations. This dual achievement allows MANet to meet the demands for both accuracy and interpretability within a single framework. In summary, the paper aims to improve the interpretability of CNN models and enhance their performance in image recognition tasks by introducing MANet and its hybrid attention mechanism.