Mask R-CNN Based Semantic RGB-D SLAM for Dynamic Scenes

Zhongqun Zhang,Jingtao Zhang,Qirong Tang
DOI: https://doi.org/10.1109/aim.2019.8868400
2019-01-01
Abstract:Traditional visual SLAM algorithms run robustly under the assumption of a static environment, but always fail in dynamic scenarios, since moving objects will impair camera pose tracking. A novel semantic SLAM framework detecting potentially moving elements by Mask R-CNN to achieve robustness in dynamic scenes for RGB-D camera is proposed in this study. In the framework, semantic instance segmentation is designed to be an independent thread which runs in parallel with other three threads: tracking, local-mapping and loop-closing. While most methods only use multi-view geometry to determine whether results of segmentation are moving, the proposed method is to simultaneously estimate the camera motion and the possibility of dynamic/static parts. Experiments are performed to compare the proposed method with state-of-the-art approaches using TUM RGB-D datasets. Results demonstrate that the proposed method can improve accuracy of the absolute trajectory in dynamic scenes.
What problem does this paper attempt to address?