Abstract:Modern vision-based reinforcement learning techniques often use convolutional neural networks (CNN) as universal function approximators to choose which action to take for a given visual input. Until recently, CNNs have been treated like black-box functions, but this mindset is especially dangerous when used for control in safety-critical settings. In this paper, we present our extensions of CNN visualization algorithms to the domain of vision-based reinforcement learning. We use a simulated drone environment as an example scenario. These visualization algorithms are an important tool for behavior introspection and provide insight into the qualities and flaws of trained policies when interacting with the physical world. A video may be seen at <a class="link-external link-https" href="https://sites.google.com/view/drlvisual" rel="external noopener nofollow">this https URL</a> .

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: **How to understand the behavior of deep reinforcement learning (DRL) policies based on convolutional neural networks (CNN) through visualization techniques, especially in safety - critical applications involving physical systems**. Specifically, the paper focuses on: 1. **The black - box problem of CNN**: Modern visual reinforcement learning techniques usually use convolutional neural networks (CNN) as general function approximators to select actions given visual inputs. However, CNN has long been regarded as a black - box model, which is difficult to debug and interpret. This is especially dangerous in physical systems that require high reliability and safety. 2. **The need for behavior verification**: Since reinforcement learning relies on trial - and - error methods for training, strict behavior verification of the learned policies must be carried out before being applied to safety - critical physical systems. 3. **The shortcomings of existing visualization techniques**: Existing CNN visualization techniques are mainly used for static image classification tasks and cannot be directly applied to time - series data and stochastic decision - making processes in reinforcement learning. To solve these problems, the paper proposes to extend three existing CNN visualization techniques (t - SNE, class visualization, attribution visualization) to the field of visual - based reinforcement learning and conducts experimental verification through a simulated drone environment. These techniques can help researchers and engineers better understand the decision - making process of CNN policies, identify their strengths and weaknesses, and thus improve the interpretability and credibility of the policies. ### Main contributions: - **Adaptive extension**: Adapt t - SNE, class visualization, and attribution visualization techniques to the visual - based reinforcement learning environment. - **Experimental verification**: Through the task of a simulated drone collecting cubes, show the application effects of these visualization tools on policies at different performance levels. - **Future work directions**: Point out opportunities for further research, such as real - time visualization, processing time - series data, etc. ### Formula summary: - **Optimization objective**: Find the CNN parameters \(\theta^*\) that maximize the cumulative reward: \[ \theta^*=\arg\max_{\theta}\sum_{t = 0}^{T - 1}r(s_t,a_t) \] where \(T - 1\) is the number of time steps in each episode. - **Class visualization optimization problem**: Generate input images that trigger specific actions: \[ s^*=\arg\max_s\pi_\theta(a|s) \] - **Grad - CAM calculation**: Calculate the importance weights \(\alpha_k\) of feature map channels: \[ \alpha_k=\frac{1}{Z}\sum_i\sum_j\frac{\partial\pi_\theta(a|s)}{\partial A_k^{ij}} \] where \(A\) is the feature map of the target convolutional layer, \(A_k\) is the \(k\)-th channel, \(A_k^{ij}\) is the neuron at position \((i, j)\), and \(Z = i\times j\). Through these techniques and methods, the paper provides a powerful toolset for understanding and improving visual - based reinforcement learning policies.

Visual Diagnostics for Deep Reinforcement Learning Policy Development

Robot Control in Human Environment Using Deep Reinforcement Learning and Convolutional Neural Network.

Pre-trained Visual Dynamics Representations for Efficient Policy Learning

Image Quality Assessment in Visual Reinforcement Learning for Fast-moving Targets

Are Gradient-based Saliency Maps Useful in Deep Reinforcement Learning?

DMC-VB: A Benchmark for Representation Learning for Control with Visual Distractors

Air Learning: A Deep Reinforcement Learning Gym for Autonomous Aerial Robot Visual Navigation

Deep Reinforcement Learning: A Brief Survey

Visual Reinforcement Learning with Self-Supervised 3D Representations

Learning Deep Sensorimotor Policies for Vision-based Autonomous Drone Racing

Deep introspective SLAM: deep reinforcement learning based approach to avoid tracking failure in visual SLAM

Stabilizing Visual Reinforcement Learning Via Asymmetric Interactive Cooperation

A Brief Survey of Deep Reinforcement Learning

Monocular vision guided deep reinforcement learning UAV systems with representation learning perception

Vision-based navigation and obstacle avoidance via deep reinforcement learning

GRI: General Reinforced Imitation and its Application to Vision-Based Autonomous Driving

ViSaRL: Visual Reinforcement Learning Guided by Human Saliency

Visual Sensor Network Reconfiguration with Deep Reinforcement Learning

Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control

An Examination of Offline-Trained Encoders in Vision-Based Deep Reinforcement Learning for Autonomous Driving

VR-Goggles for Robots: Real-to-Sim Domain Adaptation for Visual Control