Abstract:Autonomous mobile robots like self-flying drones and industrial robots heavily depend on depth images to perform tasks such as 3D reconstruction and visual SLAM. However, the presence of inaccuracies in these depth images can greatly hinder the effectiveness of these applications, resulting in sub-optimal results. Depth images produced by commercially available cameras frequently exhibit noise, which manifests as flickering pixels and erroneous patches. ML-based methods to rectify these images are unsuitable for edge devices that have very limited computational resources. Non-ML methods are much faster but have limited accuracy, especially for correcting errors that are a result of occlusion and camera movement. We propose a scheme called VoxDepth that is fast, accurate, and runs very well on edge devices. It relies on a host of novel techniques: 3D point cloud construction and fusion, and using it to create a template that can fix erroneous depth images. VoxDepth shows superior results on both synthetic and real-world datasets. We demonstrate a 31% improvement in quality as compared to state-of-the-art methods on real-world depth datasets, while maintaining a competitive framerate of 27 FPS (frames per second).

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to solve the inaccuracy problems existing in depth images, especially the impact of these inaccuracies on the tasks (such as 3D reconstruction and visual SLAM) performed by autonomous mobile robots (such as self - flying drones and industrial robots). Specifically, depth images generated by commercial depth cameras often have noise, manifested as flickering pixels and wrong patches, which will seriously affect the application effect. ### Specific manifestations of the problem 1. **Noise problems**: - **Flickering noise**: This kind of noise is random and usually manifested as rapid changes in pixel intensity. - **Algorithmic noise**: Holes caused by stereo - matching failure, and these holes persist in multiple frames. 2. **Limitations of existing methods**: - **Machine - learning - based methods**: Although they have high accuracy, they consume a large amount of computing resources and cannot achieve real - time processing on edge devices (usually only reaching 2 - 3 FPS). - **Non - machine - learning methods**: Although they are fast, they have low accuracy when dealing with errors caused by occlusion and camera movement. ### Solutions To solve these problems, the author proposes a new method named VoxDepth. The main features of VoxDepth include: 1. **3D point cloud construction and fusion**: Improve the quality of depth images by fusing the depth information in consecutive RGB - D frames into a sparse 3D point cloud. 2. **Depth image inpainting**: Generate high - resolution depth images from low - resolution point clouds to be used as 2D scene templates. 3. **Foreground - background combination module**: Generate accurate 2D depth images by resizing images, estimating motion, and correcting input frames. 4. **Dynamic recalculation of point clouds**: Recalculate the point cloud when the scene changes significantly to maintain high - quality depth estimation. ### Experimental results The experimental results of VoxDepth on real - world and synthetic datasets show that, compared with existing methods, it can significantly improve the quality of depth images (by 31% on the real - world dataset), while maintaining a frame rate of 27 frames per second. ### Conclusion VoxDepth provides a fast, accurate, and edge - device - applicable depth image correction method, which solves the deficiencies of existing methods in speed and quality, especially performing well in the case of limited computing resources.

VoxDepth: Rectification of Depth Images on Edge Devices

VDBblox: Accurate and Efficient Distance Fields for Path Planning and Mesh Reconstruction

MobiDepth: Real-Time Depth Estimation Using On-Device Dual Cameras.

Iterative Error Removal for Time-of-Flight Depth Imaging

Depth Upsampling Method Via Markov Random Fields Without Edge-Misaligned Artifacts

DELTAR: Depth Estimation from a Light-Weight ToF Sensor and RGB Image

Error modelling of depth estimation based on simplified stereo vision for mobile robots

FastDepth: Fast Monocular Depth Estimation on Embedded Systems

Object Modeling and Recognition from Sparse, Noisy Data via Voxel Depth Carving

A novel edge-enabled SLAM solution using projected depth image information

Voxgraph: Globally Consistent, Volumetric Mapping using Signed Distance Function Submaps

AutoDepthNet: High Frame Rate Depth Map Reconstruction using Commodity Depth and RGB Cameras

Real-Time Monocular Depth Estimation Merging Vision Transformers on Edge Devices for AIoT

Improving RGB-D-based 3D Reconstruction by Combining Voxels and Points

Real-time Vision-based Depth Reconstruction with NVidia Jetson

Pix2Vox: Context-aware 3D Reconstruction from Single and Multi-view Images

OwlFusion: Depth-Only Onboard Real-Time 3D Reconstruction of Scalable Scenes for Fast-Moving MAV

Color-Guided Flying Pixel Correction in Depth Images

Real-time Monocular Depth Estimation on Embedded Systems

Real-Time Monocular Human Depth Estimation and Segmentation on Embedded Systems

Lightweight Monocular Depth Estimation on Edge Devices