Abstract:While supervised learning approaches show great vitality and effectiveness in video object segmentation, most of them require large amounts of annotations which are expensive and time-consuming. Recently, self-supervised learning has attracted great attention by benefiting from unlabeled video sequences. However, current patch-based self-supervised video object segmentation methods only discriminate the patch from the entire image without distinguishing the object of interest from meaningless backgrounds or even occlusion. These disturbances deteriorate the extracted features and hinder the robustness of tracking when applied to real-world video sequences. In this paper, we propose a novel model named Tracker With Integration-Augmented Attention (TWIAA) to achieve both label-free and prominent performance. Specifically, we integrate both spatial and channel dimensions by introducing a feature spatial enhancement module and a two-stream channel module. With the combination of the two modules, the network can focus on exploring the discriminative object and suppressing the irrelevant part to improve the tracking robustness. Moreover, unlike other methods that calculate features separately on the search branch and template branch, the two designed modules coupled with the Siamese network compute the respective features of the search branch and the template branch jointly to augment the interdependence of the two branches. Such interdependence is injected into both spatial and channel dimensions. So that our approach establishes richer and more discriminative associations to identify the object more accurately. In addition, our method takes full advantage of cycle-consistency information in consecutive frames, which uses coherence as the learning signal to acquire object-oriented relationships. Extensive experiments and ablation studies are conducted on large VOS benchmarks, including DAVIS-2017, YouTube-VOS-2018, and YouTube-VOS-2019. The results verify that our proposed framework has both strong feature representation and competitive performance compared with supervised and self-supervised models.

Is Two-shot All You Need? A Label-efficient Approach for Video Segmentation in Breast Ultrasound

Two-shot Video Object Segmentation

Deep Weakly-Supervised Breast Tumor Segmentation in Ultrasound Images with Explicit Anatomical Constraints

Weakly-supervised Deep Learning for Breast Tumor Segmentation in Ultrasound Images

Fuzzy Semantic Segmentation of Breast Ultrasound Image with Breast Anatomy Constraints

BUSIS: A Benchmark for Breast Ultrasound Image Segmentation

Weakly-Supervised Ultrasound Video Segmentation with Minimal Annotations

Self-supervised Video Object Segmentation Using Integration-Augmented Attention

Automated Breast Tumor Detection and Segmentation with a Novel Computational Framework of Whole Ultrasound Images

Fully automatic tumor segmentation of breast ultrasound images with deep learning

Automatic Breast Ultrasound Image Segmentation: A Survey

A Spatial-Temporal Progressive Fusion Network for Breast Lesion Segmentation in Ultrasound Videos

Cascaded Inner-Outer Clip Retformer for Ultrasound Video Object Segmentation

Dual Teacher Model for Semi-Supervised ABUS Tumor Segmentation.

Shifting More Attention to Breast Lesion Segmentation in Ultrasound Videos

Improving Segmentation of Breast Ultrasound Images: Semi Automatic Two Pointers Histogram Splitting Technique

Boundary-guided and Region-aware Network with Global Scale-adaptive for Accurate Segmentation of Breast Tumors in Ultrasound Images

Learning Video Object Segmentation from Unlabeled Videos

SHA-MTL: soft and hard attention multi-task learning for automated breast cancer ultrasound image segmentation and classification

An efficient framework for lesion segmentation in ultrasound images using global adversarial learning and region-invariant loss

Morphology-Enhanced CAM-Guided SAM for weakly supervised Breast Lesion Segmentation