DepthGAN: GAN-based depth generation from semantic layouts

Yidi Li,Jun Xiao,Yiqun Wang,Zhengda Lu
DOI: https://doi.org/10.1007/s41095-023-0350-8
IF: 4.1268
2024-04-27
Computational Visual Media
Abstract:Abstract Existing GAN-based generative methods are typically used for semantic image synthesis. We pose the question of whether GAN-based architectures can generate plausible depth maps and find that existing methods have difficulty in generating depth maps which reasonably represent 3D scene structure due to the lack of global geometric correlations. Thus, we propose DepthGAN, a novel method of generating a depth map using a semantic layout as input to aid construction, and manipulation of well-structured 3D scene point clouds. Specifically, we first build a feature generation model with a cascade of semantically-aware transformer blocks to obtain depth features with global structural information. For our semantically aware transformer block, we propose a mixed attention module and a semantically aware layer normalization module to better exploit semantic consistency for depth features generation. Moreover, we present a novel semantically weighted depth synthesis module, which generates adaptive depth intervals for the current scene. We generate the final depth map by using a weighted combination of semantically aware depth weights for different depth ranges. In this manner, we obtain a more accurate depth map. Extensive experiments on indoor and outdoor datasets demonstrate that DepthGAN achieves superior results both quantitatively and visually for the depth generation task.
computer science, software engineering
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that the existing methods based on Generative Adversarial Networks (GAN) have difficulties in generating depth maps from semantic layouts. In particular, these methods are hard to generate depth maps that can reasonably represent the 3D scene structure because they lack the ability to capture global geometric correlations. Specifically, the paper points out: 1. **Limitations of Existing Methods**: The existing methods based on Convolutional Neural Networks (CNN) can only focus on local information due to their limited receptive fields, and are unable to accurately predict the global geometric associations between different objects, resulting in visually incoherent generated depth maps. 2. **Importance of Depth Maps**: As a 2.5D medium, depth maps can measure the distance between objects and the camera in three - dimensional space, providing a transition from 2D images to 3D scenes. Therefore, generating accurate and reasonable depth maps is of great significance for constructing 3D scenes. 3. **Proposal of a New Task**: The paper proposes a new task, that is, generating accurate depth maps using only simple semantic layouts as input to assist visual designers in constructing 3D scenes. To solve these problems, the paper proposes DepthGAN, a GAN - based depth map generation method. It generates depth features containing global structure information by introducing a series of semantically - aware Transformer blocks, and generates the final depth map through a semantically - weighted depth synthesis module. The experimental results on indoor and outdoor datasets show that DepthGAN is superior to the existing methods both quantitatively and in visual effects.