PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation

Shilin Yan,Xiaohao Xu,Renrui Zhang,Lingyi Hong,Wenchao Chen,Wenqiang Zhang,Wei Zhang

2024-07-28

Abstract:Panoramic videos contain richer spatial information and have attracted tremendous amounts of attention due to their exceptional experience in some fields such as autonomous driving and virtual reality. However, existing datasets for video segmentation only focus on conventional planar images. To address the challenge, in this paper, we present a panoramic video dataset, PanoVOS. The dataset provides 150 videos with high video resolutions and diverse motions. To quantify the domain gap between 2D planar videos and panoramic videos, we evaluate 15 off-the-shelf video object segmentation (VOS) models on PanoVOS. Through error analysis, we found that all of them fail to tackle pixel-level content discontinues of panoramic videos. Thus, we present a Panoramic Space Consistency Transformer (PSCFormer), which can effectively utilize the semantic boundary information of the previous frame for pixel-level matching with the current frame. Extensive experiments demonstrate that compared with the previous SOTA models, our PSCFormer network exhibits a great advantage in terms of segmentation results under the panoramic setting. Our dataset poses new challenges in panoramic VOS and we hope that our PanoVOS can advance the development of panoramic segmentation/tracking.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper attempts to address the challenges faced in object segmentation in panoramic videos (PanoVOS). Specifically, existing video object segmentation (VOS) methods mainly focus on traditional planar images, while panoramic videos, due to their 360°×180° field of view, can capture richer spatial information but also bring new challenges such as content discontinuity and severe distortion. These issues have not been adequately addressed in traditional VOS methods. The main contributions of the paper include: 1. **Proposing a panoramic video object segmentation dataset (PanoVOS)**: It contains 150 videos and 19,145 instance annotations, filling the gap of long-term instance-level annotated panoramic video segmentation datasets. 2. **Extensive experimental evaluation**: Evaluating 15 existing VOS models, revealing the inadequacy of current methods in handling content discontinuity in panoramic videos. 3. **Proposing a new model PSCFormer**: By introducing the panoramic spatial consistency (PSC) block, it effectively addresses the issues of content discontinuity and pixel-level matching in panoramic video segmentation. These contributions aim to advance the development of panoramic video segmentation and tracking technologies, particularly in applications such as autonomous driving and virtual reality.

PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation

Learning Spatiotemporal Relationships with a Unified Framework for Video Object Segmentation

TransVOS: Video Object Segmentation with Transformers

PASS: Panoramic Annular Semantic Segmentation

Aerial-PASS: Panoramic Annular Scene Segmentation in Drone Videos

Can We PASS Beyond the Field of View? Panoramic Annular Semantic Segmentation for Real-World Surrounding Perception

Omnisupervised Omnidirectional Semantic Segmentation

PVO: Panoptic Visual Odometry.

Video Object Segmentation in Panoptic Wild Scenes

Open Panoramic Segmentation

DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization

PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation

PanoRecon: Real-Time Panoptic 3D Reconstruction from Monocular Video

Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation

Semantic Segmentation of Panoramic Images Using a Synthetic Dataset

Panoramic Panoptic Segmentation: Towards Complete Surrounding Understanding via Unsupervised Contrastive Learning

Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers

PanoContext-Former: Panoramic Total Scene Understanding with a Transformer

Panoramic Panoptic Segmentation: Insights Into Surrounding Parsing for Mobile Agents via Unsupervised Contrastive Learning

Large-scale Video Panoptic Segmentation in the Wild: A Benchmark

Bending Reality: Distortion-aware Transformers for Adapting to Panoramic Semantic Segmentation