MAIM-VO: A Robust Visual Odometry with Mixed MLP for Weak Textured Environment

Zhiwei Shen,Bin Kong
2023-01-01
Abstract:Visual localization is a critical technology for visual SLAM systems, which determines the relative position and motion trajectory by tracking feature points. In recent years, deep learning has been widely applied to the field of visual localization. The method based on deep learning is capable of surpassing the limitations of traditional manual feature extraction methods and achieving high-precision visual localization in complex scenes, thus realizing the goal of lifelong SLAM. The MLP model has characteristics such as flexibility and adaptability. The Mixer-WMLP achieves token information exchange between spatial positions by evenly dividing the feature map into non-overlapping windows, which makes the Mixer-WMLP approach a global receptive field. Compared to CNNs and Transformers, Mixer MLPs have higher computational efficiency and robustness. In this paper, we utilize the Mixer MLP structure to design a deep learning-based visual odometry system called MAIM-VO. Even in complex scenes with low texture areas, high-quality matching can be achieved. After obtaining the matching point pairs, the camera pose is solved in an optimized way by minimizing the reprojection error of the feature points. Multiple datasets and experiments in real-world environments have demonstrated that MIAM-VO exhibits higher robustness and relative localization accuracy compared to currently popular visual SLAM systems.
What problem does this paper attempt to address?