Pixel-wise Contrastive Learning for Multi-class Instrument Segmentation in Endoscopic Robotic Surgery Videos Using Dataset-wide Sample Queues

Liping Sun,Xiong Chen
DOI: https://doi.org/10.1109/access.2024.3476622
IF: 3.9
2024-01-01
IEEE Access
Abstract:The accurate segmentation of surgical instruments in endoscopic robotic surgery is a critical challenge due to the intricate and dynamic nature of the surgical environment. Existing segmentation techniques predominantly focus on binary classification, which often falls short in complex scenarios where multiple instruments need to be precisely identified and differentiated. This limitation significantly hampers the effectiveness of computer-assisted surgical systems, where real-time and accurate multi-class segmentation is paramount. In this study, we address these challenges by introducing a novel segmentation framework that leverages pixel-wise contrastive learning to enhance multi-class instrument segmentation. Our approach integrates the strengths of HRNet and DeepLabV3 as backbone networks to effectively capture and embed high-resolution features from endoscopic images. These feature embeddings are stored in dynamically updated dataset-wide queues, enabling the model to mitigate the class imbalance by incorporating representative samples from less prevalent classes during training. This is achieved by combining an enhanced contrastive loss function with pixel-wise cross-entropy loss, which together facilitate robust multi-class differentiation. We validate our approach using the challenging EndoVis 2017 and EndoVis 2018 datasets, where it demonstrated superior performance compared to existing methods. On the EndoVis 2017 test dataset, our HRNet-based method achieved an average mIOU of 0.682 across 10 test subsets, surpassing the official benchmark of 0.542 by a significant margin. Similarly, on the EndoVis 2018 test dataset, our method attained an average mIOU of 0.599, exceeding the official best score of 0.579. These results underscore the efficacy of our approach in significantly improving segmentation accuracy. The enhanced segmentation capability not only advances the state of the art in multi-class instrument segmentation but also has practical implications for improving surgical safety and efficiency in real-world robotic surgery.
What problem does this paper attempt to address?