Multi-Scale Feature Fusion for Single Image Novel View Synthesis

Lei Jiang,Gerald Schaefer,Qinggang Meng
DOI: https://doi.org/10.1016/j.neucom.2024.128081
IF: 6
2024-01-01
Neurocomputing
Abstract:Single image novel view synthesis allows the generation of target images with different views from a single input image. Pixel generation methods are one of the main approaches for novel view synthesis, with previous methods typically using the input image to infer the target image in the new view. However, only features from input images in the source view might not be sufficient to generate a good target image, especially when only a single input image is available. In this paper, we fuse features from an input and a warped image to collaboratively generate pixels in the new view, with the warped image as an intermediate output generated by projecting pixels of the input image onto the target view via an estimated depth. Since the estimated depth and the generated warped image are not perfect, errors will be introduced when generating target pixels. To alleviate these and to ensure better channel information between the features from input and warped image, channel attention blocks are employed. In addition, in order to use skip connections for better novel view synthesis results, encoder features in different layers from the input image are transformed to the target view via multi-resolution depths. Here, instead of downsampling a single full-resolution depth to several lower-resolution depths, we adopt a multi-scale depth estimation network to predict multiple depths at different resolutions. Experimental results on benchmark datasets show that our method gives excellent view synthesis results and outperforms other state-of-the-art novel view synthesis methods.
What problem does this paper attempt to address?