Tracking Everything in Robotic-Assisted Surgery

Bohan Zhan,Wang Zhao,Yi Fang,Bo Du,Francisco Vasconcelos,Danail Stoyanov,Daniel S. Elson,Baoru Huang
2024-09-30
Abstract:Accurate tracking of tissues and instruments in videos is crucial for Robotic-Assisted Minimally Invasive Surgery (RAMIS), as it enables the robot to comprehend the surgical scene with precise locations and interactions of tissues and tools. Traditional keypoint-based sparse tracking is limited by featured points, while flow-based dense two-view matching suffers from long-term drifts. Recently, the Tracking Any Point (TAP) algorithm was proposed to overcome these limitations and achieve dense accurate long-term tracking. However, its efficacy in surgical scenarios remains untested, largely due to the lack of a comprehensive surgical tracking dataset for evaluation. To address this gap, we introduce a new annotated surgical tracking dataset for benchmarking tracking methods for surgical scenarios, comprising real-world surgical videos with complex tissue and instrument motions. We extensively evaluate state-of-the-art (SOTA) TAP-based algorithms on this dataset and reveal their limitations in challenging surgical scenarios, including fast instrument motion, severe occlusions, and motion blur, etc. Furthermore, we propose a new tracking method, namely SurgMotion, to solve the challenges and further improve the tracking performance. Our proposed method outperforms most TAP-based algorithms in surgical instruments tracking, and especially demonstrates significant improvements over baselines in challenging medical videos.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper attempts to solve the problem of accurate tracking of tissues and instruments in robot - assisted minimally invasive surgery (RAMIS). Specifically: 1. **Limitations of existing methods**: - **Traditional sparse tracking based on key points**: It depends on feature points and has poor performance for tissues with deformation or weak texture, and can only provide tracking of sparse feature points. - **Dense two - view matching based on optical flow**: Although it can perform dense tracking, it is prone to drift problems in long - time sequences, especially in the presence of occlusion and motion blur. 2. **Application challenges of the TAP algorithm**: - Although the TAP (Tracking Any Point) algorithm performs well in general scenarios, it has not been fully verified in the surgical environment. Surgical videos have unique challenges, such as poor lighting conditions, lack of significant texture features, and high specular reflection, which make it difficult for the TAP algorithm to be directly applied to the surgical scenario. 3. **Lack of appropriate evaluation datasets**: - Currently, there is a lack of tracking datasets specifically for surgical scenarios. Existing datasets such as SuPer and SurgT have deficiencies in the quantity and quality of annotations and cannot comprehensively evaluate the performance of the TAP algorithm. To solve the above problems, the author has taken the following measures: - **Create a new surgical tracking dataset**: Collected real - world surgical videos containing complex tissue and instrument movements and carried out detailed frame - by - frame annotations to provide an accurate tracking benchmark. - **Evaluate the existing TAP algorithm**: Conducted extensive evaluations of the existing TAP algorithm on the new dataset, revealing its limitations in challenging situations such as rapid instrument movement, severe occlusion, and motion blur. - **Propose a new tracking method SurgMotion**: Combined with the OmniMotion framework, introduced the Tool Mask Constraint, As - Rigid - As - Possible (ARAP Constraint), and Sparse Feature Matching Guidance to improve the tracking accuracy of surgical instruments, especially performing excellently in challenging videos. Through these improvements, SurgMotion is significantly superior to the existing TAP algorithm in surgical instrument tracking and also maintains high accuracy in tissue tracking.