Abstract:Autonomous driving has received a great deal of attention in the automotive industry and is often seen as the future of transportation. The development of autonomous driving technology has been greatly accelerated by the growth of end-to-end machine learning techniques that have been successfully used for perception, planning, and control tasks. An important aspect of autonomous driving planning is knowing how the environment evolves in the immediate future and taking appropriate actions. An autonomous driving system should effectively use the information collected from the various sensors to form an abstract representation of the world to maintain situational awareness. For this purpose, deep learning models can be used to learn compact latent representations from a stream of incoming data. However, most deep learning models are trained end-to-end and do not incorporate any prior knowledge (e.g., from physics) of the vehicle in the architecture. In this direction, many works have explored physics-infused neural network (PINN) architectures to infuse physics models during training. Inspired by this observation, we present a Kalman filter augmented recurrent neural network architecture to learn the latent representation of the traffic flow using front camera images only. We demonstrate the efficacy of the proposed model in both imitation and reinforcement learning settings using both simulated and real-world datasets. The results show that incorporating an explicit model of the vehicle (states estimated using Kalman filtering) in the end-to-end learning significantly increases performance.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to effectively predict the future state of traffic flow in autonomous driving tasks and make appropriate driving decisions accordingly. Specifically, the paper proposes a recurrent neural network (RNN) architecture enhanced by the Kalman filter, called KARNet (Kalman Filter Augmented Recurrent Neural Network), which is used to learn the latent representation of traffic flow from the front - camera images of vehicles. In this way, KARNet aims to improve the prediction ability of the autonomous driving system for future environmental changes, thereby enhancing its performance in complex traffic environments. ### Main Problems 1. **Predicting the future state of traffic flow**: The autonomous driving system needs to be able to predict the immediate future changes of the surrounding environment in order to take appropriate actions in a timely manner. KARNet achieves this by learning the latent representations extracted from the front - camera images, which can capture the key features of the traffic flow. 2. **Integrating physical models and data - driven models**: Most deep - learning models adopt an end - to - end training method, but often do not contain any prior knowledge (such as physical laws). KARNet estimates the vehicle state by combining the Kalman filter and combines it with a data - driven neural network architecture to improve the prediction accuracy. ### Solutions - **KARNet Architecture**: KARNet combines an autoencoder (AE), a gated recurrent unit (GRU), and a Kalman filter. The AE is used to learn the latent representation from the image, the GRU is used to predict the future latent variables, and the Kalman filter is used to estimate the vehicle state. - **Early Fusion and Late Fusion**: The paper explores two methods of integrating vehicle state information with KARNet, namely early fusion and late fusion. Early fusion directly concatenates the corrected state and the predicted state with the latent vector before the GRU block, while late fusion respectively inputs the concatenated corrected state and latent vector into the GRU, and concatenates the predicted state and the hidden state after the GRU output. - **Loss Function**: The multi - scale structural similarity index measure (MS - SSIM) is used as a loss function to maintain the structure of the reconstructed image and as a regularization means. ### Experimental Verification The paper verifies the effectiveness of KARNet through imitation learning and reinforcement learning experiments, using simulated and real - world datasets. The results show that integrating the explicit vehicle model (the state estimated by Kalman filtering) into end - to - end learning significantly improves performance. ### Conclusion By combining physical models and data - driven models, KARNet effectively improves the prediction ability of the autonomous driving system for future traffic flow, and thus shows better performance in complex traffic environments.

KARNet: Kalman Filter Augmented Recurrent Neural Network for Learning World Models in Autonomous Driving Tasks

CARNet: A Dynamic Autoencoder for Learning Latent Dynamics in Autonomous Driving Tasks

End-to-End Learning with Memory Models for Complex Autonomous Driving Tasks in Indoor Environments

Cognitive Map-Based Model: Toward a Developmental Framework for Self-Driving Cars

Neural World Models for Computer Vision

DynaNet: Neural Kalman Dynamical Model for Motion Estimation and Prediction

Beyond One Model Fits All: Ensemble Deep Learning for Autonomous Vehicles

Performance Evaluation of Deep Learning-Based State Estimation: A Comparative Study of KalmanNet

Autonomous driving in traffic with end-to-end vision-based deep learning

Evolutionary End-to-End Autonomous Driving Model with Continuous-Time Neural Networks

Physics Embedded Neural Network Vehicle Model and Applications in Risk-Aware Autonomous Driving Using Latent Features

Deep learning and control algorithms of direct perception for autonomous driving

NeuroSMPC: A Neural Network guided Sampling Based MPC for On-Road Autonomous Driving

Autonomous Vehicle Control: End-to-end Learning in Simulated Urban Environments

Deep Steering: Learning End-to-End Driving Model from Spatial and Temporal Visual Cues

KalmanNet: Neural Network Aided Kalman Filtering for Partially Known Dynamics

Learning On-Road Visual Control for Self-Driving Vehicles with Auxiliary Tasks

Knowledge Distillation Neural Network for Predicting Car-following Behaviour of Human-driven and Autonomous Vehicles

PnPNet: End-to-End Perception and Prediction with Tracking in the Loop

Probabilistic End-to-End Vehicle Navigation in Complex Dynamic Environments with Multimodal Sensor Fusion

A Bayesian Driver Agent Model for Autonomous Vehicles System Based on Knowledge-Aware and Real-Time Data