Chat: Cascade Hole-Aware Transformers with Geometric Spatial Consistency for Accurate Monocular Endoscopic Depth Estimation

Ming Wu,Hao Qi,Wenkang Fan,Sunkui Ke,Hui-Qing Zeng,Yinran Chen,Xiongbiao Luo
DOI: https://doi.org/10.1109/icassp48485.2024.10447105
2024-01-01
Abstract:Monocular endoscopic depth estimation is essential for surgical navigation. Current deeply learned estimation methods still suffer from lack of real data labels and porous, artifacts (e.g., bubbles), illumination variations (e.g., specular highlight), and weak texture in endoscopic video images. This paper proposes a new deep learning framework of cascade hole-aware transformers with geometric spatial consistency for accurate endoscopic depth estimation without using any image annotation. Specifically, this framework employs cascade hole-aware encoders to powerfully extract structural features of deep and shallow holes, while it further introduces multiscale filtering decoders to suppress non-hole region features, addressing the problems of specular highlights, weak textures or bubbles. Additionally, a geometric spatial consistency loss can strongly perceive geometric information and suppress the color difference between virtual and real images. We generated virtual endoscopic image data to train our network architecture and test it on both virtual and real endoscopic video images, with the experimental results showing that our method is robust to zero-shot evaluation of real data. Particularly, our method can attain lower root mean square error 1.551±1.147 mm and mean absolute error 1.004±0.632 mm than state-of-the-art deep learning approaches.
What problem does this paper attempt to address?