VoxDepth: Rectification of Depth Images on Edge Devices

Yashashwee Chakrabarty,Smruti Ranjan Sarangi
2024-07-21
Abstract:Autonomous mobile robots like self-flying drones and industrial robots heavily depend on depth images to perform tasks such as 3D reconstruction and visual SLAM. However, the presence of inaccuracies in these depth images can greatly hinder the effectiveness of these applications, resulting in sub-optimal results. Depth images produced by commercially available cameras frequently exhibit noise, which manifests as flickering pixels and erroneous patches. ML-based methods to rectify these images are unsuitable for edge devices that have very limited computational resources. Non-ML methods are much faster but have limited accuracy, especially for correcting errors that are a result of occlusion and camera movement. We propose a scheme called VoxDepth that is fast, accurate, and runs very well on edge devices. It relies on a host of novel techniques: 3D point cloud construction and fusion, and using it to create a template that can fix erroneous depth images. VoxDepth shows superior results on both synthetic and real-world datasets. We demonstrate a 31% improvement in quality as compared to state-of-the-art methods on real-world depth datasets, while maintaining a competitive framerate of 27 FPS (frames per second).
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve the inaccuracy problems existing in depth images, especially the impact of these inaccuracies on the tasks (such as 3D reconstruction and visual SLAM) performed by autonomous mobile robots (such as self - flying drones and industrial robots). Specifically, depth images generated by commercial depth cameras often have noise, manifested as flickering pixels and wrong patches, which will seriously affect the application effect. ### Specific manifestations of the problem 1. **Noise problems**: - **Flickering noise**: This kind of noise is random and usually manifested as rapid changes in pixel intensity. - **Algorithmic noise**: Holes caused by stereo - matching failure, and these holes persist in multiple frames. 2. **Limitations of existing methods**: - **Machine - learning - based methods**: Although they have high accuracy, they consume a large amount of computing resources and cannot achieve real - time processing on edge devices (usually only reaching 2 - 3 FPS). - **Non - machine - learning methods**: Although they are fast, they have low accuracy when dealing with errors caused by occlusion and camera movement. ### Solutions To solve these problems, the author proposes a new method named VoxDepth. The main features of VoxDepth include: 1. **3D point cloud construction and fusion**: Improve the quality of depth images by fusing the depth information in consecutive RGB - D frames into a sparse 3D point cloud. 2. **Depth image inpainting**: Generate high - resolution depth images from low - resolution point clouds to be used as 2D scene templates. 3. **Foreground - background combination module**: Generate accurate 2D depth images by resizing images, estimating motion, and correcting input frames. 4. **Dynamic recalculation of point clouds**: Recalculate the point cloud when the scene changes significantly to maintain high - quality depth estimation. ### Experimental results The experimental results of VoxDepth on real - world and synthetic datasets show that, compared with existing methods, it can significantly improve the quality of depth images (by 31% on the real - world dataset), while maintaining a frame rate of 27 frames per second. ### Conclusion VoxDepth provides a fast, accurate, and edge - device - applicable depth image correction method, which solves the deficiencies of existing methods in speed and quality, especially performing well in the case of limited computing resources.