A Visual Odometry Algorithm in Dynamic Scenes Based on Object Detection
Qifan Ye,Chaoyi Dong,Xiaoyang Liu,Liangliang Gao,Kang Zhang,Xiaoyan Chen
DOI: https://doi.org/10.1109/prai55851.2022.9904100
2022-01-01
Abstract:The traditional Visual Simultaneous Localization and Mapping (VSLAM) algorithm usually assume that working environments are static. If there are moving objects in the environments, the increasing difficulty of discerning dynamical feature points from static feature points will lead to inaccurate camera pose estimation and low positioning accuracy. To solve the pose estimation problem of VSLAM in dynamic scenes, a Visual Odometry (VO) algorithm based on a YOLOv5s-L network (VOA-YOLOV5S-L) is proposed to eliminate the influences of dynamical feature points. Firstly, a lightweight network YOLOv5s-L object detection algorithm is introduced for improving the VO algorithm of the traditional ORB-SLAM2. YOLOv5s-L adopts a lightweight network Shufflenetv2 to replace the backbone feature extraction network of YOLOv5s, thereby reducing the model parameters, speeding up the detection speed, and greatly improving the real-time performance of the visual object detection system. Secondly, while extracting ORB feature points by the VOA-YOLOV5S-L, the dynamic objects are detected through the YOLOv5s-L network, and the feature points on the dynamic objects are eliminated using epipolar geometric constraints. Finally, the feature points on the static objects are used to determine the camera pose. Applied to the TUM high dynamic data set, the VOA-YOLOV5S-L in this paper improves the positioning accuracy by 97.94%, the translation pose accuracy of the camera by 94.52%, and rotation pose accuracy of the camera by 91.29%, compared with ORB-SLAM2. Additionally, the operation speed of VOA-YOLOV5S-L reaches 31.5 frames/s, which is significantly higher than other classical SLAM algorithms for dynamic scenes, for examples, DS-SLAM and DynaSLAM. Therefore, the proposed VOA-YOLOV5S-L have a potential to meet the practical engineering requirements in aspects of accuracy and real-time performance.