Stereo Vision SLAM Based on Feature Extraction Network

Lei Zhang,Wenjie Na,Chenpeng Yao,Chengju Liu,Qijun Chen
DOI: https://doi.org/10.1109/piers62282.2024.10618530
2024-01-01
Abstract:With the popularization of consumer-grade low-cost robots, visual SLAM technology has been widely used for robot localization and navigation in indoor and outdoor environments. However, most current visual SLAM systems suffer from poor accuracy and robustness of pose estimation due to their vulnerability to feature extraction challenges, mismatching, and tracking loss in illumination changes and low-texture environments. Aiming to reduce the localization error of the robot and improve its operational stability, this paper proposes a novel stereo vision SLAM system based on deep neural networks. Firstly, in view of the fact that the handcrafted visual features have problems such as unstable feature extraction and inaccurate feature matching in environments with illumination changes and low texture, we designed a visual feature extraction module based on feature extraction network to overcome these issues, which contains a differentiable keypoint detection module, as well as reprojection and dispersed peak losses for accurate and repeatable keypoint training. Secondly, since the feature similarity measure of the bag-of-word algorithm uses Hamming distance, it cannot support the query and matching of visual features of deep neural networks. We improved it and generated a new visual feature dictionary based on deep neural network extraction to ensure the correctness of visual feature query and matching. Finally, in order to further optimize the pose of SLAM system, based on the original vision odometry, we use the redundant pose information of Lidar to complementary fuse the pose information of the visual front end and the laser front end. We apply the algorithm to the KITTI dataset and an actual mobile robot platform. Experimental results show that compared with other methods, the approach proposed in this article effectively enhances the accuracy and robustness of the SLAM system, and also proves the practicability and effectiveness of the method.
What problem does this paper attempt to address?