KARNet: Kalman Filter Augmented Recurrent Neural Network for Learning World Models in Autonomous Driving Tasks

Hemanth Manjunatha,Andrey Pak,Dimitar Filev,Panagiotis Tsiotras
DOI: https://doi.org/10.48550/arXiv.2305.14644
2023-05-24
Abstract:Autonomous driving has received a great deal of attention in the automotive industry and is often seen as the future of transportation. The development of autonomous driving technology has been greatly accelerated by the growth of end-to-end machine learning techniques that have been successfully used for perception, planning, and control tasks. An important aspect of autonomous driving planning is knowing how the environment evolves in the immediate future and taking appropriate actions. An autonomous driving system should effectively use the information collected from the various sensors to form an abstract representation of the world to maintain situational awareness. For this purpose, deep learning models can be used to learn compact latent representations from a stream of incoming data. However, most deep learning models are trained end-to-end and do not incorporate any prior knowledge (e.g., from physics) of the vehicle in the architecture. In this direction, many works have explored physics-infused neural network (PINN) architectures to infuse physics models during training. Inspired by this observation, we present a Kalman filter augmented recurrent neural network architecture to learn the latent representation of the traffic flow using front camera images only. We demonstrate the efficacy of the proposed model in both imitation and reinforcement learning settings using both simulated and real-world datasets. The results show that incorporating an explicit model of the vehicle (states estimated using Kalman filtering) in the end-to-end learning significantly increases performance.
Machine Learning,Robotics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively predict the future state of traffic flow in autonomous driving tasks and make appropriate driving decisions accordingly. Specifically, the paper proposes a recurrent neural network (RNN) architecture enhanced by the Kalman filter, called KARNet (Kalman Filter Augmented Recurrent Neural Network), which is used to learn the latent representation of traffic flow from the front - camera images of vehicles. In this way, KARNet aims to improve the prediction ability of the autonomous driving system for future environmental changes, thereby enhancing its performance in complex traffic environments. ### Main Problems 1. **Predicting the future state of traffic flow**: The autonomous driving system needs to be able to predict the immediate future changes of the surrounding environment in order to take appropriate actions in a timely manner. KARNet achieves this by learning the latent representations extracted from the front - camera images, which can capture the key features of the traffic flow. 2. **Integrating physical models and data - driven models**: Most deep - learning models adopt an end - to - end training method, but often do not contain any prior knowledge (such as physical laws). KARNet estimates the vehicle state by combining the Kalman filter and combines it with a data - driven neural network architecture to improve the prediction accuracy. ### Solutions - **KARNet Architecture**: KARNet combines an autoencoder (AE), a gated recurrent unit (GRU), and a Kalman filter. The AE is used to learn the latent representation from the image, the GRU is used to predict the future latent variables, and the Kalman filter is used to estimate the vehicle state. - **Early Fusion and Late Fusion**: The paper explores two methods of integrating vehicle state information with KARNet, namely early fusion and late fusion. Early fusion directly concatenates the corrected state and the predicted state with the latent vector before the GRU block, while late fusion respectively inputs the concatenated corrected state and latent vector into the GRU, and concatenates the predicted state and the hidden state after the GRU output. - **Loss Function**: The multi - scale structural similarity index measure (MS - SSIM) is used as a loss function to maintain the structure of the reconstructed image and as a regularization means. ### Experimental Verification The paper verifies the effectiveness of KARNet through imitation learning and reinforcement learning experiments, using simulated and real - world datasets. The results show that integrating the explicit vehicle model (the state estimated by Kalman filtering) into end - to - end learning significantly improves performance. ### Conclusion By combining physical models and data - driven models, KARNet effectively improves the prediction ability of the autonomous driving system for future traffic flow, and thus shows better performance in complex traffic environments.