Abstract:Purpose: Commonly employed in polyp segmentation, single-image UNet architectures lack the temporal insight clinicians gain from video data in diagnosing polyps. To mirror clinical practices more faithfully, our proposed solution, PolypNextLSTM, leverages video-based deep learning, harnessing temporal information for superior segmentation performance with least parameter overhead, making it possibly suitable for edge devices. Methods: PolypNextLSTM employs a UNet-like structure with ConvNext-Tiny as its backbone, strategically omitting the last two layers to reduce parameter overhead. Our temporal fusion module, a Convolutional Long Short Term Memory (ConvLSTM), effectively exploits temporal features. Our primary novelty lies in PolypNextLSTM, which stands out as the leanest in parameters and the fastest model, surpassing the performance of five state-of-the-art image and video-based deep learning models. The evaluation of the SUN-SEG dataset spans easy-to-detect and hard-to-detect polyp scenarios, along with videos containing challenging artefacts like fast motion and occlusion. Results: Comparison against 5 image-based and 5 video-based models demonstrates PolypNextLSTM's superiority, achieving a Dice score of 0.7898 on the hard-to-detect polyp test set, surpassing image-based PraNet (0.7519) and video-based PNS+ (0.7486). Notably, our model excels in videos featuring complex artefacts such as ghosting and occlusion. Conclusion: PolypNextLSTM, integrating pruned ConvNext-Tiny with ConvLSTM for temporal fusion, not only exhibits superior segmentation performance but also maintains the highest frames per speed among evaluated models. Code can be found here: https://github.com/mtec-tuhh/PolypNextLSTM .

Semi-supervised Spatial Temporal Attention Network for Video Polyp Segmentation

TCCNet: Temporally Consistent Context-Free Network for Semi-supervised Video Polyp Segmentation

SSTFB: Leveraging self-supervised pretext learning and temporal self-attention with feature branching for real-time video polyp segmentation

Video Polyp Segmentation: A Deep Learning Perspective

A novel non-pretrained deep supervision network for polyp segmentation

Probabilistic Modeling Ensemble Vision Transformer Improves Complex Polyp Segmentation

Progressively Normalized Self-Attention Network for Video Polyp Segmentation

Shallow Attention Network for Polyp Segmentation

PolypNextLSTM: a lightweight and fast polyp video segmentation network using ConvNext and ConvLSTM

Spatio-Temporal Video Segmentation of Static Scenes and Its Applications

LightCF-Net: A Lightweight Long-Range Context Fusion Network for Real-Time Polyp Segmentation

Diff-VPS: Video Polyp Segmentation via a Multi-task Diffusion Network with Adversarial Temporal Reasoning

PDCA-Net: Parallel dual-channel attention network for polyp segmentation

Efficient Long-Short Temporal Attention Network for Unsupervised Video Object Segmentation

Holistic Prototype Attention Network for Few-Shot Video Object Segmentation

A Spatial-Temporal Deformable Attention based Framework for Breast Lesion Detection in Videos

SALI: Short-term Alignment and Long-term Interaction Network for Colonoscopy Video Polyp Segmentation

Multi-scale Information Sharing and Selection Network with Boundary Attention for Polyp Segmentation

Spatiotemporal Graph Neural Network Based Mask Reconstruction for Video Object Segmentation

Annotation-Efficient Polyp Segmentation via Active Learning