Abstract:In this paper, we present a Stochastic Scene Grammar (SSG) for parsing 2D indoor images into 3D scene layouts. Our grammar model integrates object functionality, 3D object geometry, and their 2D image appearance in a Function-Geometry-Appearance (FGA) hierarchy. In contrast to the prevailing approach in the literature which recognizes scenes and detects objects through appearance-based classification using machine learning techniques, our method takes a different perspective to scene understanding and recognizes objects and scenes by reasoning their functionality. Functionality is an essential property which often defines the categories of objects and scenes, and decides the design of geometry and scene layout. For example, a sofa is for people to sit comfortably, and a kitchen is a space for people to prepare food with various objects. Our SSG formulates object functionality and contextual relations between objects and imagined human poses in a joint probability distribution in the FGA hierarchy. The latter includes both functional concepts (the scene category, functional groups, functional objects, functional parts) and geometric entities (3D/2D/1D shape primitives). The decomposition of the grammar is terminated on the bottom-up detected lines and regions. We use a Markov chain Monte Carlo (MCMC) algorithm to optimize the Bayesian a posteriori probability and the output parse tree includes a 3D description of the 2D image in the FGA hierarchy. Experimental results on two Yibiao Zhao University of California, Los Angeles (UCLA), USA E-mail: ybzhao@ucla.edu www.yibiaozhao.com Song-Chun Zhu University of California, Los Angeles (UCLA), USA E-mail: sczhu@stat.ucla.edu http://www.stat.ucla.edu/~sczhu challenging indoor datasets demonstrate that the proposed approach not only significantly widens the scope of indoor scene parsing from traditional scene segmentation, labeling, and 3D reconstruction to functional object recognition, but also yields improved overall performance.

Holistic 3 D Indoor Scene Parsing and Reconstruction from a Single RGB Image

Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image

Learning 3 D Scene Synthesis from Annotated RGB-D Images

Integrating Function , Geometry , Appearance for Scene Parsing

Human-centric Indoor Scene Synthesis Using Stochastic Grammar

Holistic++ Scene Understanding: Single-View 3D Holistic Scene Parsing and Human Pose Estimation With Human-Object Interaction and Physical Commonsense

Action-driven 3D Indoor Scene Evolution

A Stochastic Image Grammar for Fine-Grained 3D Scene Reconstruction.

Image Parsing Via Stochastic Scene Grammar

Single-View 3D Scene Reconstruction and Parsing by Attribute Grammar

Single-Image 3D Scene Parsing Using Geometric Commonsense

Scene Parsing by Integrating Function, Geometry and Appearance Models

Cooperative Holistic 3D Scene Understanding from a Single RGB Image

Cooperative Holistic Scene Understanding: Unifying 3D Object, Layout, and Camera Pose Estimation

Single-View 3D Scene Parsing by Attributed Grammar.

Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image

Three-Dimensional Structure Measurement And Optimization Method Of Indoor Scene Based On Single Image

Indoor Scene Generation from a Collection of Semantic-Segmented Depth Images

Reasoning Geometric Commonsense for Single-view 3D Scene Parsing

Reconstruction for Indoor Scenes Based on an Interpretable Inference

Singe Image-Based Data-Driven Indoor Scenes Modeling