Optimised ProPainter for Video Diminished Reality Inpainting

Pengze Li,Lihao Liu,Carola-Bibiane Schönlieb,Angelica I Aviles-Rivero
2024-06-04
Abstract:In this paper, part of the DREAMING Challenge - Diminished Reality for Emerging Applications in Medicine through Inpainting, we introduce a refined video inpainting technique optimised from the ProPainter method to meet the specialised demands of medical imaging, specifically in the context of oral and maxillofacial surgery. Our enhanced algorithm employs the zero-shot ProPainter, featuring optimized parameters and pre-processing, to adeptly manage the complex task of inpainting surgical video sequences, without requiring any training process. It aims to produce temporally coherent and detail-rich reconstructions of occluded regions, facilitating clearer views of operative fields. The efficacy of our approach is evaluated using comprehensive metrics, positioning it as a significant advancement in the application of diminished reality for medical purposes.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper attempts to address the issue of key areas being obscured in oral and maxillofacial surgery videos due to the occlusion caused by surgical instruments and the surgeon's hands. Specifically, the authors propose an optimized video inpainting technique (ProPainter) to generate temporally coherent and detail-rich reconstructions of occluded areas without any training, thereby providing a clearer surgical view. ### Main Issues 1. **Occluded Area Restoration**: In surgical videos, surgical instruments and the surgeon's hands can obscure the patient's face or other important parts, requiring algorithms to remove these occlusions and restore the background of the occluded areas. 2. **Temporal Consistency**: Maintaining temporal coherence during video inpainting is crucial to avoid noticeable differences between frames. 3. **Real-time Processing**: To meet practical application needs, the inpainting algorithm must complete processing within a limited time to support real-time or near-real-time application scenarios. ### Solution - **Optimized ProPainter Model**: The authors optimized the existing ProPainter method, including parameter adjustments and preprocessing techniques, to better adapt to the specific needs of medical imaging. - **Zero-shot Learning**: Utilizing pre-trained model weights to achieve efficient video inpainting without additional training. - **Multi-domain Propagation Mechanism**: Ensuring spatial and temporal consistency through image propagation and feature propagation. - **Sparse Video Transformer**: Introducing a sparse video transformer to improve computational efficiency while maintaining the quality of the inpainting results. ### Experimental Results - **Performance Evaluation**: The model's performance was comprehensively evaluated using various metrics (such as LPIPS, FID, MAE, PSNR, etc.), and the results showed that this method outperformed existing state-of-the-art methods (such as Stable Diffusion) on multiple metrics. - **Practical Application**: In the first phase of the DREAMING challenge, this method achieved first place, demonstrating its effectiveness in practical application scenarios. ### Conclusion By optimizing the ProPainter framework, this study successfully addressed the issue of occluded area restoration in oral and maxillofacial surgery videos, providing temporally coherent and detail-rich reconstruction results, thereby offering strong support for enhancing visual clarity in medical practice.