Self-Supervised Video Desmoking for Laparoscopic Surgery

Renlong Wu,Zhilu Zhang,Shuohao Zhang,Longfei Gou,Haobin Chen,Lei Zhang,Hao Chen,Wangmeng Zuo
2024-08-15
Abstract:Due to the difficulty of collecting real paired data, most existing desmoking methods train the models by synthesizing smoke, generalizing poorly to real surgical scenarios. Although a few works have explored single-image real-world desmoking in unpaired learning manners, they still encounter challenges in handling dense smoke. In this work, we address these issues together by introducing the self-supervised surgery video desmoking (SelfSVD). On the one hand, we observe that the frame captured before the activation of high-energy devices is generally clear (named pre-smoke frame, PS frame), thus it can serve as supervision for other smoky frames, making real-world self-supervised video desmoking practically feasible. On the other hand, in order to enhance the desmoking performance, we further feed the valuable information from PS frame into models, where a masking strategy and a regularization term are presented to avoid trivial solutions. In addition, we construct a real surgery video dataset for desmoking, which covers a variety of smoky scenes. Extensive experiments on the dataset show that our SelfSVD can remove smoke more effectively and efficiently while recovering more photo-realistic details than the state-of-the-art methods. The dataset, codes, and pre-trained models are available at \url{<a class="link-external link-https" href="https://github.com/ZcsrenlongZ/SelfSVD" rel="external noopener nofollow">this https URL</a>}.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper attempts to solve the problem of desmoking in laparoscopic surgery videos. Specifically, existing methods face the following challenges: 1. **Difficulty in data acquisition**: It is very difficult to collect real paired smoke - and smoke - free image or video data. As a result, most existing desmoking methods rely on synthetic smoke data for training, which leads to poor generalization performance of the model in real - surgical scenarios. 2. **Poor performance in handling thick smoke**: Although some works have explored single - image desmoking in the unpaired learning mode, there are still challenges in handling thick smoke. 3. **Lack of real - video datasets**: Currently, there is a lack of real - world datasets specifically for desmoking in laparoscopic surgery videos, which limits the development of related research. To solve these problems, the authors propose a self - supervised surgical - video desmoking method (SelfSVD), and its main contributions are as follows: - **Using pre - operative clear frames as supervision**: By observing that the frames before the activation of high - energy devices are usually clear (referred to as pre - operative frames, PS frames), they can be used as supervision information for other smoke frames, thereby achieving practical real - world self - supervised video desmoking. - **Introducing masking strategies and regularization terms**: To enhance desmoking performance, the information of PS frames is further input into the model, and masking strategies and regularization terms are introduced to avoid trivial solutions. - **Constructing a real - surgical - video dataset**: To fill this gap in the field, the authors collected multiple laparoscopic surgery videos and constructed a real - surgical - video desmoking dataset (LSVD) containing multiple smoke scenarios, and carried out extensive experimental verification. Through these improvements, SelfSVD can remove smoke more effectively while restoring more photo - realistic details, outperforming the existing state - of - the - art methods. ### Formula Summary 1. **Video desmoking objective function**: \[ \hat{I}_i = D(\{S_i\}_{i = 1}^N; \Theta_D) \] where \(D\) represents the video desmoking model, and \(\Theta_D\) is the model parameter. 2. **Self - supervised learning objective**: \[ \Theta_D^*=\arg\min_{\Theta_D}L(D(\{S_i\}_{i = 1}^N; \Theta_D), S_{ps}) \] 3. **Optical flow estimation and backward warping**: \[ \Psi_{ps\rightarrow i}=O(S_{ps}, \hat{I}_i) \] \[ \hat{I}_i\rightarrow ps = W(\hat{I}_i, \Psi_{ps\rightarrow i}) \] 4. **Reconstruction loss**: \[ L_{rec}=\sum_{i = 1}^N\|V_i\odot(\hat{I}_i\rightarrow ps - S_{ps})\|_1 \] where \(V_i\) is a mask indicating the valid positions of the optical flow. 5. **Regularization loss**: \[ L_{reg}=\|M_i\odot F_{ref\rightarrow i}\|_1 \] 6. **Total loss function**: \[ L = L_{rec}+\lambda_{reg}L_{reg}+\lambda_{GAN}L_{GAN} \] Through these formulas and methods, SelfSVD can handle the smoke problem in laparoscopic surgery videos more effectively.