Temporal3D: 2D-to-3d Video Conversion Network with Multi-frame Fusion

Zheyu Zhang,Ronggang Wang
DOI: https://doi.org/10.1109/ctisc54888.2022.9849751
2022-01-01
Abstract:This paper proposes a 2D-to-3D video conversion network, which aggregate the information of adjacent frames to synthesize the novel view. To solve the occlusion problem in novel view, adjacent frames are used to reproduce the missing data caused by view switching. Especially, a temporal aggregated attention module is proposed to make the network focus on the useful information for synthesizing viewpoints. Mask fusion module is introduced to produce the novel view according to corresponding relationship between each frame and the estimated view. The proposed network is trained by the left-right pairs extracted from 3D movies, without extra real depths. Experimental results show that the proposed method can produce high quality novel view which is superior to the state-of-the-art approach both subjective and objective quality.
What problem does this paper attempt to address?