Abstract:This study presents a deep learning research aimed at improving the performance of image classification models and increasing interpretability as well. We explore ways to improve the model by combining attention mechanisms with convolutional neural networks. This study uses garbage classification data in public datasets for training in a supervised learning manner, and employs Grad-CAM (Gradient-weighted Class Activation Mapping) with channel attention mechanism SE (Squeeze-and-Excitation) to generate heat maps for better understanding the decision-making process of the model's classification. By using the Grad-CAM function to generate heat maps, it is possible to visualize the areas which the model focuses on during classification. This provides a method to explain the model's classification decisions, allowing us to better understand the basis of the model's decisions on different categories of images. The model is improved by adding attention modules to different stages of the ResNet50 (Residual Network-50) network, thereby improving the accuracy and performance of the network. We notice that within the same stage, the structure and required attention of each module are consistent, so only one attention module is added in each stage to reduce the burden of network learning and speed up learning. In order to simplify the calculation process of attention, a global tensor is introduced to store the attention of each stage, thereby avoiding repeated calculations in each module. The experimental results show that compared with traditional convolutional neural networks, our proposed method achieves better performance on garbage classification tasks. By combining the attention mechanism and heat map interpretation, our model is able to improve classification accuracy. This is of great significance for image classification tasks in practical applications and helps to promote the research progress of deep learning in interpretability.

Attention Branch Network: Learning of Attention Mechanism for Visual Explanation

Exclusive Feature Constrained Class Activation Mapping for Better Visual Explanation.

ST-ABN: Visual Explanation Taking into Account Spatio-temporal Information for Video Recognition

ACNET: Attention Based Network to Exploit Complementary Features for RGBD Semantic Segmentation.

MANet: Mixed Attention Network for Visual Explanation

Attention Backpropagation

TAME: Attention Mechanism Based Feature Fusion for Generating Explanation Maps of Convolutional Neural Networks

Explanation vs Attention: A Two-Player Game to Obtain Attention for VQA

Towards A Comprehensive Visual Saliency Explanation Framework for AI-based Face Recognition Systems

Embedding deep networks into visual explanations

Saliency Prediction on Omnidirectional Images with Brain-Like Shallow Neural Network.

EMBANet: A Flexible Efffcient Multi-branch Attention Network

Deep Learning Methods With the Improved Attention for Explainable Image Recognition

Better Deep Visual Attention with Reinforcement Learning in Action Recognition.

LFI-CAM: Learning Feature Importance for Better Visual Explanation

Visual-TCAV: Concept-based Attribution and Saliency Maps for Post-hoc Explainability in Image Classification

Where is the Model Looking At?--Concentrate and Explain the Network Attention

Loss-Based Attention for Interpreting Image-Level Prediction of Convolutional Neural Networks.

Attend and Guide (AG-Net): A Keypoints-driven Attention-based Deep Network for Image Recognition

Automated Natural Language Explanation of Deep Visual Neurons with Large Models

Top-Down Neural Attention by Excitation Backprop