SivsFormer: Parallax-Aware Transformers for Single-image-based View Synthesis

Chunlan Zhang,Chunyu Lin,Kang Liao,Lang Nie,Yao Zhao
DOI: https://doi.org/10.1109/vr51125.2022.00022
2022-01-01
Abstract:Single-image-based view synthesis is significant for generating a 3D scene and gains increasing attention in recent years. However, this task is challenging as it requires inferring contents beyond what is immediately visible. Previous methods directly predict the unknown views using the convolutional neural networks, but the generated views suffer from visually unpleasant holes, deformations, and artifacts. In this paper, we propose a Single-image-based view synthesis transformer (named SivsFormer) for high-quality and realistic view synthesis. In particular, a warping and occlusion handing module is designed to reduce the influence of parallax on the network. Subsequently, a disparity alignment module captures the long-range information over the scene and ensures that pixels move in a geometrically correct manner with soft probabilistic disparity maps. Moreover, we present a parallax-aware loss function to improve the quality of the synthetic images, which explicitly quantifies the magnitude of parallaxes. We conduct extensive experiments on popular KITTI and Cityscapes datasets. Benefitting from the proposed parallax-aware transformer, our approach achieves superior performance in both quantitative and qualitative evaluations.
What problem does this paper attempt to address?