Abstract:Deep reinforcement learning has become popular over recent years, showing superiority on different visual-input tasks such as playing Atari games and robot navigation. Although objects are important image elements, few work considers enhancing deep reinforcement learning with object characteristics. In this paper, we propose a novel method that can incorporate object recognition processing to deep reinforcement learning models. This approach can be adapted to any existing deep reinforcement learning frameworks. State-of-the-art results are shown in experiments on Atari games. We also propose a new approach called "object saliency maps" to visually explain the actions made by deep reinforcement learning agents.
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve
This paper aims to address two main issues in deep reinforcement learning (DRL) when handling visual tasks involving multiple objects:
1. **Improving Performance**: Existing deep reinforcement learning models typically treat all objects as equally important when handling visual tasks involving multiple objects, without fully utilizing the characteristics of the objects (such as their presence and location). This leads to suboptimal performance in certain tasks. To improve this, the authors propose a new method—Object-sensitive Deep Reinforcement Learning (O-DRL), which enhances the model's performance by incorporating object recognition during the learning process.
2. **Interpretability**: Current deep reinforcement learning models lack interpretability, meaning they cannot provide human-understandable explanations for their decisions. When a model takes a certain action, people cannot understand the logic behind it. To address this issue, the authors propose a new method—Object Saliency Maps, which generate visual explanations to illustrate why the model chooses a particular action.
### Main Contributions
1. **Incorporating Object Features**: The authors propose a method to incorporate object features (such as the presence and location of objects) into deep reinforcement learning models, improving the model's performance in various Atari games.
2. **Generating Object-level Visual Explanations**: The authors propose the Object Saliency Maps method, which can generate object-level visual explanations to help users understand the model's decision-making process.
3. **Experimental Validation**: Through experiments on multiple Atari games, the proposed method is shown to outperform existing methods in terms of performance and provide meaningful explanations.
### Method Overview
1. **Object-sensitive Deep Reinforcement Learning Model**:
- **Object Channels**: By adding object channels to the original image's RGB channels, object features are encoded into the input. Each object channel represents a type of object, with detected object pixels assigned a value of 1 and other pixels assigned a value of 0.
- **Network Architecture**: Combining object channels and original image inputs, features are extracted through a Convolutional Neural Network (CNN) to predict the Q-value of each action. This method can be applied to different deep reinforcement learning frameworks, such as DQN, DDQN, and A3C.
2. **Object Saliency Maps**:
- **Pixel Saliency Maps**: The concept of pixel saliency maps is introduced first, generating pixel-level saliency maps by calculating the derivative of the Q-value function with respect to the state image.
- **Object Saliency Maps**: To generate object-level explanations, for each object, a new state is formed by occluding the object, and the difference in Q-values between the new and old states is calculated to determine the object's impact on the Q-value. Positive differences indicate "good" objects, while negative differences indicate "bad" objects.
### Experimental Results
- **Object Recognition Effectiveness**: Object recognition is performed using a template matching method, with precision consistently at 1 and F1 scores above 0.9, indicating accurate extraction of object channels.
- **Performance Improvement**: In multiple Atari games, the O-DRL model outperforms traditional DRL models, with a 20% performance improvement in the Ms. Pacman game.
- **Case Study**: A detailed analysis of the Ms. Pacman game demonstrates the effectiveness of the O-DRL model and the advantage of Object Saliency Maps in explaining the model's decisions.
### Conclusion and Future Work
- **Conclusion**: Incorporating object features can significantly improve the performance of deep reinforcement learning models, and Object Saliency Maps provide interpretability.
- **Future Work**: Explore how to use Object Saliency Maps to generate natural language explanations and apply object features in more realistic tasks, such as autonomous driving.