Neural Finite-State Machines for Surgical Phase Recognition

Hao Ding,Zhongpai Gao,Benjamin Planche,Tianyu Luan,Abhishek Sharma,Meng Zheng,Ange Lou,Terrence Chen,Mathias Unberath,Ziyan Wu
2024-11-27
Abstract:Surgical phase recognition is essential for analyzing procedure-specific surgical videos. While recent transformer-based architectures have advanced sequence processing capabilities, they struggle with maintaining consistency across lengthy surgical procedures. Drawing inspiration from classical hidden Markov models' finite-state interpretations, we introduce the neural finite-state machine (NFSM) module, which bridges procedural understanding with deep learning approaches. NFSM combines procedure-level understanding with neural networks through global state embeddings, attention-based dynamic transition tables, and transition-aware training and inference mechanisms for offline and online applications. When integrated into our future-aware architecture, NFSM improves video-level accuracy, phase-level precision, recall, and Jaccard indices on Cholec80 datasets by 2.3, 3.2, 3.0, and 4.8 percentage points respectively. As an add-on module to existing state-of-the-art models like Surgformer, NFSM further enhances performance, demonstrating its complementary value. Extended experiments on non-surgical datasets validate NFSM's generalizability beyond surgical domains. Comprehensive experiments demonstrate that incorporating NSFM into deep learning frameworks enables more robust and consistent phase recognition across long procedural videos.
Image and Video Processing,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper attempts to solve the problems of consistency and accuracy in surgical phase recognition (SPR) in surgical videos. Specifically, existing methods have difficulty maintaining consistent phase recognition when dealing with long - time surgical videos, especially being prone to errors when predicting phase transitions during the surgical process. For example, the baseline method may misclassify brief actions as phase transitions, resulting in fragmented prediction results (as shown in Figure 1a). This reflects the challenges of existing methods in maintaining temporal consistency. To solve these problems, the author introduced a new model - Neural Finite - State Machine (NFSM), aiming to enhance the consistency of phase recognition in long - time surgical videos by combining classical finite - state machine theory and modern deep - learning techniques. The NFSM module achieves this goal in the following ways: 1. **Global State Embeddings**: Create unique phase identifiers to capture the specific features of each surgical phase. 2. **Attention - based Dynamic Transition Tables**: Generate dynamic tables for predicting phase - to - phase transitions, which are updated as the surgery progresses. 3. **Transition - aware Training and Inference Mechanisms**: Support offline and online applications. By combining short - term encoders and long - term decoders with pseudo - future embeddings, the model can learn and predict future phase transitions. In addition, the NFSM module is designed as a plug - and - play component and can be seamlessly integrated into existing advanced models (such as Surgformer) without fine - tuning the base model, thereby improving performance and maintaining computational efficiency. Through experiments on benchmark datasets such as Cholec80 and AutoLaparo, NFSM has demonstrated significant improvements in video - level accuracy, phase - level precision, recall, and Jaccard index, validating its effectiveness and universality in the surgical phase recognition task.