A System of Configurable 3D Indoor Scene Synthesis Via Semantic Relation Learning

Xinyan Yang,Fei Hu,Long Ye,Zhiming Chang,Jiyin Li
DOI: https://doi.org/10.1016/j.displa.2022.102168
IF: 3.074
2022-01-01
Displays
Abstract:When describing a complicated scene, natural language usually has extra meanings beyond the actual descriptions based on the context of objects and relations. Thus, synthesizing 3D scene from natural language description can be regarded as an ill-posed problem. To solve this challenge, we build a novel system named Text to Scene (T2S) via relation learning for the language-driven scene synthesis. In our paper, we propose a novel graph-based contextual completion method Contextual ConvE(CConvE) to enrich the 3D indoor scene and visualize the graph by arranging 3D models under an object location protocol. Besides, we integrate them into T2S system which synthesizes 3D scene from text. Given a text, T2S system will organize inclusive semantic message to a graph template, complete the graph with CConvE and visualize the graph by retrieving and arranging 3D models under the protocol. CConvE is a convolutional neural network which can infer object categories and spatial relations from the contextual message passing. The experimental result shows the competitive performance of CConvE compared with the state-of-the-art approach and proves that introducing semantic object relation learning method to 3D scene synthesis task can make the virtual visualized results in line with real life.
What problem does this paper attempt to address?