Abstract:Depth estimation is a fundamental task in many vision applications. With the popularity of omnidirectional cameras, it becomes a new trend to tackle this problem in the spherical space. In this paper, we propose a learning-based method for predicting dense depth values of a scene from a monocular omnidirectional image. An omnidirectional image has a full field-of-view, providing much more complete descriptions of the scene than perspective images. However, fully-convolutional networks that most current solutions rely on fail to capture rich global contexts from the panorama. To address this issue and also the distortion of equirectangular projection in the panorama, we propose Cubemap Vision Transformers (CViT), a new transformer-based architecture that can model long-range dependencies and extract distortion-free global features from the panorama. We show that cubemap vision transformers have a global receptive field at every stage and can provide globally coherent predictions for spherical signals. As a general architecture, it removes any restriction that has been imposed on the panorama in many other monocular panoramic depth estimation methods. To preserve important local features, we further design a convolution-based branch in our pipeline (dubbed GLPanoDepth) and fuse global features from cubemap vision transformers at multiple scales. This global-to-local strategy allows us to fully exploit useful global and local features in the panorama, achieving state-of-the-art performance in panoramic depth estimation.

Densely connected convolutional network block based autoencoder for panorama map compression

Design of an Enhanced Visual Odometry by Building and Matching Compressive Panoramic Landmarks Online

HDR Image Compression with Convolutional Autoencoder.

3D-CNN Autoencoder for Plenoptic Image Compression.

Detection and compression of moving objects based on new panoramic image modeling

DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization

3D Orientation Estimation and Vanishing Point Extraction from Single Panoramas Using Convolutional Neural Network

Asymmetric representation for 3D panoramic video

PADENet: an Efficient and Robust Panoramic Monocular Depth Estimation Network for Outdoor Scenes.

Deep AutoEncoder-based Lossy Geometry Compression for Point Clouds

An Overview of Panoramic Video Projection Schemes in the IEEE 1857.9 Standard for Immersive Visual Content Coding

A New Motion Model for Panoramic Video Coding

Panoramic Video Quality Assessment Based on Non-Local Spherical CNN

Panoramic Image Inpainting With Gated Convolution And Contextual Reconstruction Loss

ACDNet: Adaptively Combined Dilated Convolution for Monocular Panorama Depth Estimation

GLPanoDepth: Global-to-Local Panoramic Depth Estimation

PanoFormer: Panorama Transformer for Indoor 360 Depth Estimation

Point Cloud Geometry Compression Based on Multi-Layer Residual Structure

Point AE-DCGAN: A Deep Learning Model for 3D Point Cloud Lossy Geometry Compression.

Volumetric End-to-End Optimized Compression for Brain Images.

A Panoramic Video Face Detection System Design and Implement