Abstract:Computer-aided ultrasound (US) imaging is an important prerequisite for early clinical diagnosis and treatment. Due to the harsh ultrasound (US) image quality and the blurry tumor area, recent memory-based video object segmentation models (VOS) achieve frame-level segmentation by performing intensive similarity matching among the past frames which could inevitably result in computational redundancy. Furthermore, the current attention mechanism utilized in recent models only allocates the same attention level among whole spatial-temporal memory features without making distinctions, which may result in accuracy degradation. In this paper, we first build a larger annotated benchmark dataset for breast lesion segmentation in ultrasound videos, then we propose a lightweight clip-level VOS framework for achieving higher segmentation accuracy while maintaining the speed. The Inner-Outer Clip Retformer is proposed to extract spatialtemporal tumor features in parallel. Specifically, the proposed Outer Clip Retformer extracts the tumor movement feature from past video clips to locate the current clip tumor position, while the Inner Clip Retformer detailedly extracts current tumor features that can produce more accurate segmentation results. Then a Clip Contrastive loss function is further proposed to align the extracted tumor features along both the spatial-temporal dimensions to improve the segmentation accuracy. In addition, the Global Retentive Memory is proposed to maintain the complementary tumor features with lower computing resources which can generate coherent temporal movement features. In this way, our model can significantly improve the spatial-temporal perception ability without increasing a large number of parameters, achieving more accurate segmentation results while maintaining a faster segmentation speed. Finally, we conduct extensive experiments to evaluate our proposed model on several video object segmentation datasets, the results show that our framework outperforms state-of-theart segmentation methods.

Weakly-Supervised Ultrasound Video Segmentation with Minimal Annotations

Weakly-Interactive-Mixed Learning: Less Labelling Cost for Better Medical Image Segmentation.

Weakly-supervised Deep Learning for Breast Tumor Segmentation in Ultrasound Images

Weakly Supervised Video Object Segmentation via Dual-attention Cross-branch Fusion

Morphology-Enhanced CAM-Guided SAM for weakly supervised Breast Lesion Segmentation

An efficient framework for lesion segmentation in ultrasound images using global adversarial learning and region-invariant loss

Deep Weakly-Supervised Breast Tumor Segmentation in Ultrasound Images with Explicit Anatomical Constraints

Shifting More Attention to Breast Lesion Segmentation in Ultrasound Videos

A New Dataset and A Baseline Model for Breast Lesion Detection in Ultrasound Videos

Weakly supervised real-time instance segmentation for ultrasound images of median nerves

Weakly Semi-Supervised Detection in Lung Ultrasound Videos

Cascaded Inner-Outer Clip Retformer for Ultrasound Video Object Segmentation

Combining unsupervised constraints on weakly supervised semantic segmentation of skin cancer

Weakly Supervised Histopathology Image Segmentation with Sparse Point Annotations

MambaEviScrib: Mamba and Evidence-Guided Consistency Enhance CNN Robustness for Scribble-Based Weakly Supervised Ultrasound Image Segmentation

Weakly Supervised Lesion Detection and Diagnosis for Breast Cancers with Partially Annotated Ultrasound Images

Segmentation in Weakly Labeled Videos via a Semantic Ranking and Optical Warping Network

Weakly-Supervised Learning via Multi-Lateral Decoder Branching for Guidewire Segmentation in Robot-Assisted Cardiovascular Catheterization

Looking Beyond Single Images for Weakly Supervised Semantic Segmentation Learning.

Frame-to-video-based Semi-supervised Lung Ultrasound Scoring Model

WEIGHTED AREA CONSTRAINTS-BASED BREAST LESION SEGMENTATION IN ULTRASOUND IMAGE ANALYSIS