Abstract:Due to the difficulty of collecting real paired data, most existing desmoking methods train the models by synthesizing smoke, generalizing poorly to real surgical scenarios. Although a few works have explored single-image real-world desmoking in unpaired learning manners, they still encounter challenges in handling dense smoke. In this work, we address these issues together by introducing the self-supervised surgery video desmoking (SelfSVD). On the one hand, we observe that the frame captured before the activation of high-energy devices is generally clear (named pre-smoke frame, PS frame), thus it can serve as supervision for other smoky frames, making real-world self-supervised video desmoking practically feasible. On the other hand, in order to enhance the desmoking performance, we further feed the valuable information from PS frame into models, where a masking strategy and a regularization term are presented to avoid trivial solutions. In addition, we construct a real surgery video dataset for desmoking, which covers a variety of smoky scenes. Extensive experiments on the dataset show that our SelfSVD can remove smoke more effectively and efficiently while recovering more photo-realistic details than the state-of-the-art methods. The dataset, codes, and pre-trained models are available at \url{<a class="link-external link-https" href="https://github.com/ZcsrenlongZ/SelfSVD" rel="external noopener nofollow">this https URL</a>}.

What problem does this paper attempt to address?

This paper attempts to solve the problem of desmoking in laparoscopic surgery videos. Specifically, existing methods face the following challenges: 1. **Difficulty in data acquisition**: It is very difficult to collect real paired smoke - and smoke - free image or video data. As a result, most existing desmoking methods rely on synthetic smoke data for training, which leads to poor generalization performance of the model in real - surgical scenarios. 2. **Poor performance in handling thick smoke**: Although some works have explored single - image desmoking in the unpaired learning mode, there are still challenges in handling thick smoke. 3. **Lack of real - video datasets**: Currently, there is a lack of real - world datasets specifically for desmoking in laparoscopic surgery videos, which limits the development of related research. To solve these problems, the authors propose a self - supervised surgical - video desmoking method (SelfSVD), and its main contributions are as follows: - **Using pre - operative clear frames as supervision**: By observing that the frames before the activation of high - energy devices are usually clear (referred to as pre - operative frames, PS frames), they can be used as supervision information for other smoke frames, thereby achieving practical real - world self - supervised video desmoking. - **Introducing masking strategies and regularization terms**: To enhance desmoking performance, the information of PS frames is further input into the model, and masking strategies and regularization terms are introduced to avoid trivial solutions. - **Constructing a real - surgical - video dataset**: To fill this gap in the field, the authors collected multiple laparoscopic surgery videos and constructed a real - surgical - video desmoking dataset (LSVD) containing multiple smoke scenarios, and carried out extensive experimental verification. Through these improvements, SelfSVD can remove smoke more effectively while restoring more photo - realistic details, outperforming the existing state - of - the - art methods. ### Formula Summary 1. **Video desmoking objective function**: \[ \hat{I}_i = D(\{S_i\}_{i = 1}^N; \Theta_D) \] where \(D\) represents the video desmoking model, and \(\Theta_D\) is the model parameter. 2. **Self - supervised learning objective**: \[ \Theta_D^*=\arg\min_{\Theta_D}L(D(\{S_i\}_{i = 1}^N; \Theta_D), S_{ps}) \] 3. **Optical flow estimation and backward warping**: \[ \Psi_{ps\rightarrow i}=O(S_{ps}, \hat{I}_i) \] \[ \hat{I}_i\rightarrow ps = W(\hat{I}_i, \Psi_{ps\rightarrow i}) \] 4. **Reconstruction loss**: \[ L_{rec}=\sum_{i = 1}^N\|V_i\odot(\hat{I}_i\rightarrow ps - S_{ps})\|_1 \] where \(V_i\) is a mask indicating the valid positions of the optical flow. 5. **Regularization loss**: \[ L_{reg}=\|M_i\odot F_{ref\rightarrow i}\|_1 \] 6. **Total loss function**: \[ L = L_{rec}+\lambda_{reg}L_{reg}+\lambda_{GAN}L_{GAN} \] Through these formulas and methods, SelfSVD can handle the smoke problem in laparoscopic surgery videos more effectively.

Self-Supervised Video Desmoking for Laparoscopic Surgery

LSD3K: A Benchmark for Smoke Removal from Laparoscopic Surgery Images

A Self-Supervised Network-Based Smoke Removal and Depth Estimation for Monocular Endoscopic Videos

Attention-Aware Laparoscopic Image Desmoking Network with Lightness Embedding and Hybrid Guided Embedding

Desmoking laparoscopy surgery images using an image-to-image translation guided by an embedded dark channel

Can Image Enhancement be Beneficial to Find Smoke Images in Laparoscopic Surgery?

A Smoke Removal Method Based on Combined Data and Modified U-Net for Endoscopic Images

Progressive Frequency-Aware Network for Laparoscopic Image Desmoking

Automatic Smoke Analysis in Minimally Invasive Surgery by Image-based Machine Learning

Single Image Desmoking via Attentive Generative Adversarial Network for Smoke Detection Process

A Smoke Removal Method for Laparoscopic Images

Performance and Non-adversarial Robustness of the Segment Anything Model 2 in Surgical Video Segmentation

MARS-GAN: Multilevel-Feature-Learning Attention-Aware Based Generative Adversarial Network for Removing Surgical Smoke

Dissecting self-supervised learning methods for surgical computer vision

Toward Zero-Shot Learning for Visual Dehazing of Urological Surgical Robots

Endoscopic image classification algorithm based on Poolformer

Self-distillation for surgical action recognition

Dual-stage semantic segmentation of endoscopic surgical instruments

Self-Supervised Siamese Learning on Stereo Image Pairs for Depth Estimation in Robotic Surgery

A Survey on Deep Learning Assisted Video Quality Enhancement in Laparoscopic Videos

Surgical smoke removal via residual Swin transformer network