Moving Object Segmentation: All You Need Is SAM (and Flow)

Junyu Xie,Charig Yang,Weidi Xie,Andrew Zisserman

2024-04-19

Abstract:The objective of this paper is motion segmentation -- discovering and segmenting the moving objects in a video. This is a much studied area with numerous careful,and sometimes complex, approaches and training schemes including: self-supervised learning, learning from synthetic datasets, object-centric representations, amodal representations, and many more. Our interest in this paper is to determine if the Segment Anything model (SAM) can contribute to this task. We investigate two models for combining SAM with optical flow that harness the segmentation power of SAM with the ability of flow to discover and group moving objects. In the first model, we adapt SAM to take optical flow, rather than RGB, as an input. In the second, SAM takes RGB as an input, and flow is used as a segmentation prompt. These surprisingly simple methods, without any further modifications, outperform all previous approaches by a considerable margin in both single and multi-object benchmarks. We also extend these frame-level segmentations to sequence-level segmentations that maintain object identity. Again, this simple model outperforms previous methods on multiple video object segmentation benchmarks.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problem this paper attempts to address is motion segmentation in videos, which involves detecting and segmenting moving objects in videos. This is a widely researched field with many well-designed and sometimes quite complex methods and training schemes, including self-supervised learning, learning from synthetic datasets, object-centric representations, modal representations, etc. The focus of this paper is to explore whether the Segment Anything Model (SAM) can contribute to this task. To achieve this goal, the authors propose two models that combine SAM with optical flow, aiming to leverage SAM's segmentation capabilities and optical flow's ability to detect and group moving objects. The first model, FlowI-SAM, directly uses optical flow as input instead of traditional RGB images. The second model, FlowP-SAM, uses RGB images as input and employs optical flow as segmentation cues. Both methods significantly outperform all previous methods in single-object and multi-object benchmarks. Additionally, these frame-level segmentations are extended to sequence-level segmentations to maintain object identity consistency. Similarly, this simple approach also outperforms previous methods on multiple video object segmentation benchmarks.

Moving Object Segmentation: All You Need Is SAM (and Flow)

Motionobject segmentation algorithm based on spatial-temporal information

UnSAMFlow: Unsupervised Optical Flow Guided by Segment Anything Model

SAMFlow: Eliminating Any Fragmentation in Optical Flow with Segment Anything Model

Appearance-Based Refinement for Object-Centric Motion Segmentation

Flow2seg: Motion-Aided Semantic Segmentation

Motion Segmentation from a Moving Monocular Camera

Automatic Video Object Segmentation Algorithm for Multiple Scenes

Motion-inductive Self-supervised Object Discovery in Videos

Multi-Motion Segmentation: Combining Geometric Model-Fitting and Optical Flow for RGB Sensors

VideoSAM: Open-World Video Segmentation

SegFlow: Joint Learning for Video Object Segmentation and Optical Flow

Moving Object Proposals with Deep Learned Optical Flow for Video Object Segmentation

SAM Fails to Segment Anything? – SAM-Adapter: Adapting SAM in Underperformed Scenes: Camouflage, Shadow, Medical Image Segmentation, and More

Motion Objects Segmentation Using a New Level Set Based Method

Propagating Semantic Labels in Video Data

SSF-MOS: Semantic Scene Flow Assisted Moving Object Segmentation for Autonomous Vehicles

Improving Unsupervised Video Object Segmentation via Fake Flow Generation

Moving Object Segmentation Based on Local Differential Optical Flow

VideoSAM: A Large Vision Foundation Model for High-Speed Video Segmentation

SimulFlow: Simultaneously Extracting Feature and Identifying Target for Unsupervised Video Object Segmentation