Abstract:Few-shot video object segmentation (FSVOS) aims to achieve accurate segmentation of novel objects in given video sequences, where the target objects are specified by limited annotated images as support. Most previous top-performing methods adopt the support-query semantic correlation learning paradigm or the intra-query temporal correlation learning paradigm. Nevertheless, they either fail to model temporal consistency across frames, resulting in inconsecutive segmentation, or lose diverse support object information, leading to incomplete segmentation. Therefore, we argue that it is more desirable to achieve both correlations in a collaborative manner. In this work, we delve into the issues present in the combination of few-shot image segmentation methods and video object segmentation methods and propose a dedicated Collaborative Correlation Network (CoCoNet) to address these problems, including a pixel correlation calibration module and a temporal correlation mining module. The proposed CoCoNet enjoys several merits. First, the pixel correlation calibration module aims to mitigate the noise issue in support-query correlation by integrating the affinity learning strategy and the prototype learning strategy. Specifically, we employ Optimal Transport to enrich pixel correlation with contextual information, thereby reducing intra-class differences between support and query. Second, the temporal correlation mining module is responsible for alleviating the issue of uncertainty in the initial frame and establishing reliable guidance for subsequent frames of the query video. With the collaboration of these two modules, our CoCoNet can effectively establish support-query and temporal correlation simultaneously and achieve accurate FSVOS. Extensive experimental results on two challenging benchmarks demonstrate that our method performs favorably against state-of-the-art FSVOS methods.

Dual Correlation Network for Efficient Video Semantic Segmentation

Deep Dual-Stream Network with Scale Context Selection Attention Module for Semantic Segmentation

A Large-Scale Point Cloud Semantic Segmentation Network Via Local Dual Features and Global Correlations

Exploring Temporal Feature Correlation for Efficient and Stable Video Semantic Segmentation

Exploring the Better Correlation for Few-Shot Video Object Segmentation

Dual Graph Convolutional Network for Semantic Segmentation.

Video object segmentation via couple streams and feature memory

Efficient Unsupervised Video Object Segmentation Network Based on Motion Guidance

A dual-branch hybrid network of CNN and transformer with adaptive keyframe scheduling for video semantic segmentation

Learning Spatial-Semantic Features for Robust Video Object Segmentation

LinkNet: 2D-3D Linked Multi-Modal Network for Online Semantic Segmentation of RGB-D Videos

Attention-based Dual Context Aggregation for Image Semantic Segmentation

Weakly Supervised Video Object Segmentation via Dual-attention Cross-branch Fusion

Dual Cross-Attention for Video Object Segmentation Via Uncertainty Refinement

Spatial-information Guided Adaptive Context-aware Network for Efficient RGB-D Semantic Segmentation

Dual-branch deep cross-modal interaction network for semantic segmentation with thermal images

Dual-Path Feature Fusion Network for Semantic Segmentation of Remote Sensing Images

SCREENING AND CHARACTERIZATION OF KERATINASE FROM Bacillus licheniformis ISOLATED FROM NAMAKKAL POULTRY FARM

Compact interactive dual-branch network for real-time semantic segmentation

Interactive Fusion and Correlation Network for Three-Modal Images Few-Shot Semantic Segmentation

Attention-based Dual Supervised Decoder for RGBD Semantic Segmentation