Visual Diagnostics for Deep Reinforcement Learning Policy Development

Jieliang Luo,Sam Green,Peter Feghali,George Legrady,Çetin Kaya Koç
DOI: https://doi.org/10.48550/arXiv.1809.06781
2018-09-27
Abstract:Modern vision-based reinforcement learning techniques often use convolutional neural networks (CNN) as universal function approximators to choose which action to take for a given visual input. Until recently, CNNs have been treated like black-box functions, but this mindset is especially dangerous when used for control in safety-critical settings. In this paper, we present our extensions of CNN visualization algorithms to the domain of vision-based reinforcement learning. We use a simulated drone environment as an example scenario. These visualization algorithms are an important tool for behavior introspection and provide insight into the qualities and flaws of trained policies when interacting with the physical world. A video may be seen at <a class="link-external link-https" href="https://sites.google.com/view/drlvisual" rel="external noopener nofollow">this https URL</a> .
Machine Learning,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to understand the behavior of deep reinforcement learning (DRL) policies based on convolutional neural networks (CNN) through visualization techniques, especially in safety - critical applications involving physical systems**. Specifically, the paper focuses on: 1. **The black - box problem of CNN**: Modern visual reinforcement learning techniques usually use convolutional neural networks (CNN) as general function approximators to select actions given visual inputs. However, CNN has long been regarded as a black - box model, which is difficult to debug and interpret. This is especially dangerous in physical systems that require high reliability and safety. 2. **The need for behavior verification**: Since reinforcement learning relies on trial - and - error methods for training, strict behavior verification of the learned policies must be carried out before being applied to safety - critical physical systems. 3. **The shortcomings of existing visualization techniques**: Existing CNN visualization techniques are mainly used for static image classification tasks and cannot be directly applied to time - series data and stochastic decision - making processes in reinforcement learning. To solve these problems, the paper proposes to extend three existing CNN visualization techniques (t - SNE, class visualization, attribution visualization) to the field of visual - based reinforcement learning and conducts experimental verification through a simulated drone environment. These techniques can help researchers and engineers better understand the decision - making process of CNN policies, identify their strengths and weaknesses, and thus improve the interpretability and credibility of the policies. ### Main contributions: - **Adaptive extension**: Adapt t - SNE, class visualization, and attribution visualization techniques to the visual - based reinforcement learning environment. - **Experimental verification**: Through the task of a simulated drone collecting cubes, show the application effects of these visualization tools on policies at different performance levels. - **Future work directions**: Point out opportunities for further research, such as real - time visualization, processing time - series data, etc. ### Formula summary: - **Optimization objective**: Find the CNN parameters \(\theta^*\) that maximize the cumulative reward: \[ \theta^*=\arg\max_{\theta}\sum_{t = 0}^{T - 1}r(s_t,a_t) \] where \(T - 1\) is the number of time steps in each episode. - **Class visualization optimization problem**: Generate input images that trigger specific actions: \[ s^*=\arg\max_s\pi_\theta(a|s) \] - **Grad - CAM calculation**: Calculate the importance weights \(\alpha_k\) of feature map channels: \[ \alpha_k=\frac{1}{Z}\sum_i\sum_j\frac{\partial\pi_\theta(a|s)}{\partial A_k^{ij}} \] where \(A\) is the feature map of the target convolutional layer, \(A_k\) is the \(k\)-th channel, \(A_k^{ij}\) is the neuron at position \((i, j)\), and \(Z = i\times j\). Through these techniques and methods, the paper provides a powerful toolset for understanding and improving visual - based reinforcement learning policies.