Diffusion-Based Depth Inpainting for Transparent and Reflective Objects

Tianyu Sun,Dingchang Hu,Yixiang Dai,Guijin Wang
DOI: https://doi.org/10.1109/TCSVT.2024.3434740
2024-10-11
Abstract:Transparent and reflective objects, which are common in our everyday lives, present a significant challenge to 3D imaging techniques due to their unique visual and optical properties. Faced with these types of objects, RGB-D cameras fail to capture the real depth value with their accurate spatial information. To address this issue, we propose DITR, a diffusion-based Depth Inpainting framework specifically designed for Transparent and Reflective objects. This network consists of two stages, including a Region Proposal stage and a Depth Inpainting stage. DITR dynamically analyzes the optical and geometric depth loss and inpaints them automatically. Furthermore, comprehensive experimental results demonstrate that DITR is highly effective in depth inpainting tasks of transparent and reflective objects with robust adaptability.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper attempts to solve the difficult problem of obtaining depth information of transparent and reflective objects in 3D imaging technology. Specifically, due to their unique visual and optical properties, it is difficult for RGB - D cameras to capture accurate depth values for transparent and reflective objects. To solve this problem, the author proposes a depth inpainting framework based on the diffusion model (DITR), which is specifically used for depth inpainting of transparent and reflective objects. ### Problem Background Transparent and reflective objects are very common in daily life, but they pose significant challenges to 3D imaging technology. The main reason is the special optical properties of these objects, which cause RGB - D cameras to be unable to accurately capture their real - depth information. This not only affects the performance of subsequent algorithm modules but also makes it impossible to infer accurate spatial information from a single RGB image. ### Main Obstacles There are two main obstacles mentioned in the paper: 1. **Optical Properties**: The special optical properties of transparent and reflective objects seriously damage the imaging performance of RGB - D cameras. For example, the infrared spectrum penetrates transparent objects and is reflected on the surface of reflective objects, causing the camera to be unable to obtain accurate depth information. 2. **Complexity of Depth Loss Generation**: In addition to the depth loss caused by the optical properties of transparent and reflective objects, geometric occlusion between objects can also lead to missing depth values. In addition, the different principal optical axes of RGB cameras and depth cameras lead to optical parallax, further increasing the frequency of missing regions in the depth map. ### Solutions To address the above two main obstacles, the author proposes DITR, a two - stage depth inpainting framework. DITR includes two stages: - **Region Proposal Stage**: Decompose the depth loss into optical depth loss and geometric depth loss and process them separately. - **Depth Inpainting Stage**: Use the inpainting strategy based on the diffusion model to repair the optical depth loss and geometric depth loss respectively. In this way, DITR can effectively repair the depth information of transparent and reflective objects on various real - world datasets, showing good adaptability and robustness. ### Experimental Results The experimental results show that DITR outperforms the existing SOTA methods on multiple public datasets (such as ClearGrasp, TODD, and STD). The following are some of the experimental results: | Method | RMSE | MAE | REL | δ1.05 | δ1.10 | δ1.25 | | --- | --- | --- | --- | --- | --- | --- | | DeepCompletion [22] | 0.209 | 0.207 | 0.396 | 34.61 | 52.79 | 71.32 | | DenseDepth [36] | 0.057 | 0.059 | 0.083 | 41.82 | 64.48 | 90.35 | | SRD [37] | 0.049 | 0.044 | 0.072 | 67.11 | 79.64 | 91.33 | | MiDaS [38] | 0.044 | 0.038 | 0.069 | 72.87 | 88.12 | 94.37 | | LDM [24] | 0.046 | 0.044 | 0.071 | 74.18 | 83.57 | 92.19 | | ClearGrasp [15] | 0.040 | 0.031 | 0.056 | 68.72 | 85.11 | 96.29 | | LIDF [39] | 0.028 | 0.022 | 0.035 | 79.17 | 91.14 | 98.30 | | TranspareNet [16] | 0.026 | 0.022 | 0.039 | 76.93 | 90.02 | 98.10 | | DFNet [33] | 0.025 | 0.021 | 0.037 | 81.99 | 92.83 |