Frequency-Based Temporal Analysis Network for Accurate Phase Recognition from Surgical Videos

Sainan Zhang,Tianze Xu,Zhi Cao,Hongen Liao,Guochen Ning,Fang Chen
DOI: https://doi.org/10.1109/isbi56570.2024.10635806
2024-01-01
Abstract:Surgical phase recognition is a crucial yet challenging task for computer-assisted surgery systems. Existing approaches often use temporal convolution for temporal modeling, but they may overlook the spatial contexts and struggle with frames in surgical videos that are vague or nondescript. In this study, we propose a novel method for surgical phase recognition that utilizes a frequency perspective to model clip-level information instead of isolated frame information. Specifically, we introduce a frequency-based temporal analysis module to improve the extraction of temporal features. Furthermore, we implement a two-branch structure to effectively model long-term information and minimize over-segmentation for ambiguous frames. Extensive experiments on a large surgical video dataset (Cholec80) demonstrate outstanding performance of our proposed method.
What problem does this paper attempt to address?