DeepKalPose: An Enhanced Deep-Learning Kalman Filter for Temporally Consistent Monocular Vehicle Pose Estimation

Leandro Di Bella,Yangxintong Lyu,Adrian Munteanu
DOI: https://doi.org/10.1049/ell2.13191
2024-04-25
Abstract:This paper presents DeepKalPose, a novel approach for enhancing temporal consistency in monocular vehicle pose estimation applied on video through a deep-learning-based Kalman Filter. By integrating a Bi-directional Kalman filter strategy utilizing forward and backward time-series processing, combined with a learnable motion model to represent complex motion patterns, our method significantly improves pose accuracy and robustness across various conditions, particularly for occluded or distant vehicles. Experimental validation on the KITTI dataset confirms that DeepKalPose outperforms existing methods in both pose accuracy and temporal consistency.
Computer Vision and Pattern Recognition,Artificial Intelligence,Robotics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the temporal consistency problem of vehicle 6D pose estimation in monocular videos. Specifically, traditional image - based methods, when processing video data, process each frame independently, resulting in temporal inconsistency. Especially when dealing with occluded or distant vehicles, the predicted poses will have large fluctuations and instability (such as flickering artifacts). These problems are mainly due to the lack of constraints on inter - frame continuity and consistency in traditional methods. To solve these problems, the authors propose a new method named DeepKalPose. By combining deep learning and the Kalman Filter, it aims to improve the temporal consistency of vehicle pose estimation, thereby enhancing the accuracy and robustness of pose estimation. The following are the main contributions of this method: 1. **Bidirectional Kalman filtering strategy**: DeepKalPose adopts a bidirectional Kalman filtering strategy, using forward and backward time - series processing, enabling the image - based pose estimator to better handle video data. 2. **Learnable motion model**: A learnable motion model is introduced and integrated into the Kalman Filter, allowing the network to learn more complex and nonlinear vehicle motion patterns. 3. **Experimental verification**: The experimental results on the KITTI dataset show that DeepKalPose outperforms existing image - based techniques in both pose accuracy and temporal consistency, especially when dealing with occluded and distant vehicles. Through these improvements, DeepKalPose effectively solves the temporal inconsistency problem existing in traditional methods and improves the stability and accuracy of vehicle pose estimation.