PCformer: A Parallel Convolutional Transformer Network for 360° Depth Estimation

Chao Xu,Huamin Yang,Cheng Han,Chao Zhang
DOI: https://doi.org/10.1049/cvi2.12144
IF: 1.484
2023-01-01
IET Computer Vision
Abstract:360° depth estimation has been extensively studied because 360° images provide a full field of view of the surrounding environment as well as a detailed description of the entire scene. However, most well‐studied convolutional neural networks (CNNs) for 360° depth estimation can extract local features well, but fail to capture rich global features from the panorama due to a fixed receptive field in CNNs. PCformer, a parallel convolutional transformer network that combines the benefits of CNNs and transformers, is proposed for 360° depth estimation. The transformer has the nature to model long‐range dependency and extract global features. With PCformer, both global dependency and local spatial features can be efficiently captured. To fully incorporate global and local features, a dual attention fusion module is designed. Besides, a distortion‐weighted loss function is designed to reduce the distortion in panoramas. Extensive experiments demonstrate that the proposed method achieves competitive results against the state‐of‐the‐art methods on three benchmark datasets. Additional experiments also demonstrate that the proposed model has benefits in terms of model complexity and generalisation capability.
What problem does this paper attempt to address?