Abstract:Estimating geometric elements such as depth, camera motion, and optical flow from images is an important part of the robot's visual perception. We use a joint self-supervised method to estimate the three geometric elements. Depth network, optical flow network and camera motion network are independent of each other but are jointly optimized during training phase. Compared with independent training, joint training can make full use of the geometric relationship between geometric elements and provide dynamic and static information of the scene. In this paper, we improve the joint self-supervision method from three aspects: network structure, dynamic object segmentation, and geometric constraints. In terms of network structure, we apply the attention mechanism to the camera motion network, which helps to take advantage of the similarity of camera movement between frames. And according to attention mechanism in Transformer, we propose a plug-and-play convolutional attention module. In terms of dynamic object, according to the different influences of dynamic objects in the optical flow self-supervised framework and the depth-pose self-supervised framework, we propose a threshold algorithm to detect dynamic regions, and mask that in the loss function respectively. In terms of geometric constraints, we use traditional methods to estimate the fundamental matrix from the corresponding points to constrain the camera motion network. We demonstrate the effectiveness of our method on the KITTI dataset. Compared with other joint self-supervised methods, our method achieves state-of-the-art performance in the estimation of pose and optical flow, and the depth estimation has also achieved competitive results. Code will be available <a class="link-external link-https" href="https://github.com/jianfenglihg/Unsupervised_geometry" rel="external noopener nofollow">this https URL</a>.

Unsupervised Learning of Depth and Ego-Motion with Absolutely Global Scale Recovery from Visual and Inertial Data Sequences

Monocular Depth Estimation Based on Unsupervised Learning

Unsupervised Scale-consistent Depth and Ego-motion Learning from Monocular Video

A Depth Estimation Framework Based on Unsupervised Learning and Cross-Modal Translation

Towards Scale-Aware, Robust, and Generalizable Unsupervised Monocular Depth Estimation by Integrating IMU Motion Dynamics

Unsupervised Learning of Depth and Ego-Motion with Spatial-Temporal Geometric Constraints

Unsupervised Learning of Monocular Depth and Ego-motion in Outdoor/Indoor Environments

Self-Supervised 3D Reconstruction and Ego-Motion Estimation Via On-Board Monocular Video

Monocular Depth and Ego-motion Estimation with Scale Based on Superpixel and Normal Constraints

SelfOdom: Self-supervised Egomotion and Depth Learning via Bi-directional Coarse-to-Fine Scale Recovery

Towards Scale-Aware Self-Supervised Multi-Frame Depth Estimation with IMU Motion Dynamics.

Unsupervised Learning of Monocular Depth and Ego-Motion Using Multiple Masks

Geometry-Aware Network for Unsupervised Learning of Monocular Camera's Ego-Motion

Cycle-SfM: Joint Self-Supervised Learning of Depth and Camera Motion from Monocular Image Sequences.

Unsupervised Video Depth Estimation Based on Ego-motion and Disparity Consensus

Self-Supervised Learning of Depth and Ego-motion for 3D Perception in Human Computer Interaction

Self-Supervised Scale Recovery for Monocular Depth and Egomotion Estimation

Unsupervised Joint Learning of Depth, Optical Flow, Ego-motion from Video

Self-Supervised Learning of Depth and Ego-Motion from Videos by Alternative Training and Geometric Constraints from 3-D to 2-D

Improving Unsupervised Learning of Monocular Depth and Ego-Motion Via Stereo Network

Unsupervised Scale-Consistent Depth Learning from Video