Foresee: Attentive Future Projections of Chaotic Road Environments with Online Training

Anil Sharma,Prabhat Kumar
DOI: https://doi.org/10.48550/arXiv.1805.11861
2018-05-30
Abstract:In this paper, we train a recurrent neural network to learn dynamics of a chaotic road environment and to project the future of the environment on an image. Future projection can be used to anticipate an unseen environment for example, in autonomous driving. Road environment is highly dynamic and complex due to the interaction among traffic participants such as vehicles and pedestrians. Even in this complex environment, a human driver is efficacious to safely drive on chaotic roads irrespective of the number of traffic participants. The proliferation of deep learning research has shown the efficacy of neural networks in learning this human behavior. In the same direction, we investigate recurrent neural networks to understand the chaotic road environment which is shared by pedestrians, vehicles (cars, trucks, bicycles etc.), and sometimes animals as well. We propose \emph{Foresee}, a unidirectional gated recurrent units (GRUs) network with attention to project future of the environment in the form of images. We have collected several videos on Delhi roads consisting of various traffic participants, background and infrastructure differences (like 3D pedestrian crossing) at various times on various days. We train \emph{Foresee} in an unsupervised way and we use online training to project frames up to $0.5$ seconds in advance. We show that our proposed model performs better than state of the art methods (prednet and Enc. Dec. LSTM) and finally, we show that our trained model generalizes to a public dataset for future projections.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to predict future scenarios in complex and chaotic road environments. Specifically, the author aims to train a Recurrent Neural Network (RNN) so that it can learn the dynamic characteristics of the road environment and predict future environmental states in the form of images. This future - prediction technology can be used to pre - judge unknown environments, such as sensing potential traffic conditions in advance in autonomous driving. ### Main Problems 1. **Complex and Random Road Environments**: The road environment is highly dynamic and complex, especially due to the interactions among pedestrians, vehicles (such as cars, trucks, bicycles, etc.) and other possible participants (such as animals). 2. **Simulation of Human Driver Behavior**: Despite the complex environment, human drivers are still able to drive safely on chaotic roads. The author hopes to imitate this ability using deep - learning techniques. 3. **Limitations of Existing Methods**: Existing prediction methods have deficiencies in dealing with uncertainty and dynamic changes, especially in environments without clear rules (such as the roads in Delhi, India). ### Solutions The author proposes a deep - learning architecture named Foresee, which uses a unidirectional Gated Recurrent Unit (GRU) with an attention mechanism to predict future environmental images. By collecting a large amount of real - road - environment data and conducting online training, Foresee can accurately predict future frames within 0.5 seconds. ### Specific Contributions 1. **Proposing a New Deep - Learning Architecture**: Foresee uses GRU and an attention mechanism and performs better than existing methods (such as PredNet and encoder - decoder LSTM) in predicting future projections. 2. **Constructing a Large - Scale Real - Road - Environment Data Set**: It includes 101 videos from Delhi roads, covering a variety of traffic scenarios. 3. **Exploring the Effects of Online Training**: Through online training, Foresee can better adapt to new environments and improve prediction performance. 4. **Verifying the Generalization Ability of the Model**: The performance of Foresee has been tested on public data sets (such as Kitti), proving that it has good generalization ability. ### Summary of Mathematical Formulas The mathematical formulas involved in the paper are mainly used to describe the calculation process of GRU and its improvement methods (such as the attention mechanism). The following is the presentation of key formulas in Markdown format: #### GRU Unit Calculation - Reset Gate: \[ r_t=\sigma(W_{ir}*x_t + b_{ir}+W_{hr}*h_{t - 1}+b_{hr}) \] - Update Gate: \[ z_t=\sigma(W_{iz}*x_t + b_{iz}+W_{hz}*h_{t - 1}+b_{hz}) \] - Candidate Hidden State: \[ n_t=\tanh(W_{in}*x_t + b_{in}+r_t*(W_{hn}*h_{t - 1}+b_{hn})+b_{hn}) \] - Hidden State Update: \[ h_t=(1 - z_t)*n_t+z_t*h_{t - 1} \] #### Attention Mechanism - Energy Calculation: \[ e_{ij}=\tanh(\text{Mul}(O_t,W)+b) \] - Attention Weights: \[ a_i=\exp(e_{ij}) \] - Context Vector: \[ C_t=\text{Softmax}(a_i) \] - Weighted Output: \[ O^{\text{wt}}_t=