Abstract:Estimating geometric elements such as depth, camera motion, and optical flow from images is an important part of the robot's visual perception. We use a joint self-supervised method to estimate the three geometric elements. Depth network, optical flow network and camera motion network are independent of each other but are jointly optimized during training phase. Compared with independent training, joint training can make full use of the geometric relationship between geometric elements and provide dynamic and static information of the scene. In this paper, we improve the joint self-supervision method from three aspects: network structure, dynamic object segmentation, and geometric constraints. In terms of network structure, we apply the attention mechanism to the camera motion network, which helps to take advantage of the similarity of camera movement between frames. And according to attention mechanism in Transformer, we propose a plug-and-play convolutional attention module. In terms of dynamic object, according to the different influences of dynamic objects in the optical flow self-supervised framework and the depth-pose self-supervised framework, we propose a threshold algorithm to detect dynamic regions, and mask that in the loss function respectively. In terms of geometric constraints, we use traditional methods to estimate the fundamental matrix from the corresponding points to constrain the camera motion network. We demonstrate the effectiveness of our method on the KITTI dataset. Compared with other joint self-supervised methods, our method achieves state-of-the-art performance in the estimation of pose and optical flow, and the depth estimation has also achieved competitive results. Code will be available <a class="link-external link-https" href="https://github.com/jianfenglihg/Unsupervised_geometry" rel="external noopener nofollow">this https URL</a>.

Self-Supervised Learning of Optical Flow, Depth, Camera Pose and Rigidity Segmentation with Occlusion Handling.

Unsupervised Learning of Scene Flow Estimation Fusing with Local Rigidity.

Joint Self-supervised Depth and Optical Flow Estimation towards Dynamic Objects

Unsupervised Learning of Depth, Optical Flow and Pose With Occlusion From 3D Geometry

EffiScene: Efficient Per-Pixel Rigidity Inference for Unsupervised Joint Learning of Optical Flow, Depth, Camera Pose and Motion Segmentation

Self-supervised Learning of Scene Flow with Occlusion Handling Through Feature Masking

Semantic and Optical Flow Guided Self-supervised Monocular Depth and Ego-Motion Estimation

Unsupervised Learning Optical Flow in Multi-frame Dynamic Environment Using Temporal Dynamic Modeling

UFD-PRiME: Unsupervised Joint Learning of Optical Flow and Stereo Depth through Pixel-Level Rigid Motion Estimation

FlowDepth: Decoupling Optical Flow for Self-Supervised Monocular Depth Estimation

Self-Supervised Multi-Scale Hierarchical Refinement Method for Joint Learning of Optical Flow and Depth

Learning By Analogy: Reliable Supervision From Transformations For Unsupervised Optical Flow Estimation

Self-Attention-Based Multiscale Feature Learning Optical Flow with Occlusion Feature Map Prediction

EMR-MSF: Self-Supervised Recurrent Monocular Scene Flow Exploiting Ego-Motion Rigidity

Self-supervised Learning of Monocular 3D Geometry Understanding with Two- and Three-View Geometric Constraints

Unsupervised Joint Learning of Depth, Optical Flow, Ego-motion from Video

Cycle-SfM: Joint Self-Supervised Learning of Depth and Camera Motion from Monocular Image Sequences.

Self-supervised Learning of Occlusion Aware Flow Guided 3D Geometry Perception with Adaptive Cross Weighted Loss from Monocular Videos

Fast Multi-frame Stereo Scene Flow with Motion Segmentation

Feature-Level Collaboration: Joint Unsupervised Learning of Optical Flow, Stereo Depth and Camera Motion.

Automatic Layered RGB-D Scene Flow Estimation with Optical Flow Field Constraint.