Abstract:A visual scene can be recursively parsed into parts and sub-parts. We present a causal model of scene parsing that can synthesize and identify parse trees, predict perceptual complexity, and pass a (limited) Turing test. It describes the causal process of generating a scene as splitting it recursively with vertical and horizontal cuts, modulated by two parameters: (a) Splitting Factor (SF) – a large SF favors splitting a part into more sub-parts, generating a wide and shallow tree; (b) Part Similarity (PS) – a large PS favors evenly splitting a part into sub-parts. Given a scene, it infers the best parsing tree by evaluating the number of parts and their similarities at each partition. It inspires three human experiments beyond RT/accuracy measurements. (1) "Just cut it". Participants freely cut a blank scene into 6 rectangles recursively. With human-generated images, the model removed all free parameters by estimating human SF and PS priors. (2) "Complexity comparison". Some scenes are immediately perceived as more complex than others. Scene complexity can be quantified as "information content", determined by the probability of generating that scene (higher probability carries less information). Participants ranked the complexities of 20 scenes via paired comparisons. The model ranked the same images by computing their information contents. The result revealed a strong correlation between human and model rankings (r2 = 0.85). (3) Turing test. Each scene can be interpreted by multiple parsing trees. Participants viewed a scene and one parsing tree, then reported whether the tree was generated by a human or machine. Two baseline models were introduced: (a) the causal model with non-informative SF and PS priors; (b) a model uniformly sampling a tree from valid ones. Only the causal model with human priors passed Turing test. These results demonstrate how to formalize human scene parsing with a causal model. Meeting abstract presented at VSS 2018

When Are Tree Structures Necessary for Deep Learning of Representations?

Recursive Deep Models for Discourse Parsing

On Tree-Based Neural Sentence Modeling

A Model of Coherence Based on Distributed Sentence Representation.

TreeNet: Learning Sentence Representations with Unconstrained Tree Structure.

Graph-Based Dependency Parsing With Recursive Neural Network

Sentence Modeling with Gated Recursive Neural Network

Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks

Learning Program Representations with a Tree-Structured Transformer

Semantic Role Labeling Using Recursive Neural Network

A Causal Model of Recursive Scene Parsing in Human Perception

Learning to Compose over Tree Structures via POS Tags

Recursive Subtree Composition in LSTM-Based Dependency Parsing

On the Practical Ability of Recurrent Neural Networks to Recognize Hierarchical Languages

Learning Tag Embeddings and Tag-specific Composition Functions in Recursive Neural Network.

Learning to Embed Sentences Using Attentive Recursive Trees.

Tree-Structured Neural Machine for Linguistics-Aware Sentence Generation

Dynamic Compositional Neural Networks over Tree Structure

Mechanisms for handling nested dependencies in neural-network language models and humans

Transformer-Based Models Are Not Yet Perfect At Learning to Emulate Structural Recursion

A Re-ranking Model for Dependency Parser with Recursive Convolutional Neural Network