Abstract:In this chapter we present a method for learning a compositional model in a minimax entropy framework for modeling object categories with large intra-class variance. The model we learn incorporates the flexibility of a stochastic context free grammar (SCFG) to account for the variation in object structure with the neighborhood constraints of a Markov random field (MRF) to enforce spatial context. We learn the model through a generalized minimax entropy framework that accounts for the dynamic structure of the hierarchical model. We first learn the SCFG parameters using the frequencies of object parts, then pursue spatial relations in order of greatest information gain. The learned model can generalize from a small set of training samples (n < 100) to generate a combinatorially large number of novel instances using stochastic sampling. To verify our learning method and model performance, we present plots of KL divergence minimization as the algorithm proceeds, and show that samples from the model become more realistic as more spatial relations are added. We also show the model accurately predicting missing or undetected parts for top-down recognition along with preliminary results showing that the model can learn a large space of category appearances from a very small (n < 15) number of training samples. This process is similar to “recognition-by-components”, a theory that postulates that biological vision systems recognize objects as composed from a dictionary of commonly appearing 3D structures. Finally, we discuss a compositional boosting algorithm for inference and show examples using it for object recognition. This article is a chapter from the book Object Categorization: Computer and Human Vision Perspectives, edited by Sven Dickinson, Ales Leonardis, Bernt Schiele, and Michael J. Tarr (Cambridge University Press). University of California Los Angeles, Los Angeles, CA. Lotus Hill Research Institute, EZhou, China.

Rates for Inductive Learning of Compositional Models

Towards a Unified Compositional Model for Visual Pattern Modeling

Compositional Substitutivity of Visual Reasoning for Visual Question Answering

What makes Models Compositional? A Theoretical View: With Supplement

Compositional diversity in visual concept learning

A Survey on Compositional Learning of AI Models: Theoretical and Experimental Practices

Compositional Zero-shot Learning Via Progressive Language-based Observations

A Study of Compositional Generalization in Neural Models

Learning to Infer Unseen Single-/ Multi-Attribute-Object Compositions with Graph Networks.

Provable Compositional Generalization for Object-Centric Learning

Generative Modeling of Infinite Occluded Objects for Compositional Scene Representation

Compositional Generalization by Learning Analytical Expressions.

Learning Compositional Models for Object Categories from Small Sample Sets

Improving Compositional Generalization Using Iterated Learning and Simplicial Embeddings

Interpretable Composition Attribution Enhancement for Visio-linguistic Compositional Understanding

COLA: A Benchmark for Compositional Text-to-image Retrieval

Towards Understanding the Relationship between In-context Learning and Compositional Generalization

Learning to Infer Unseen Attribute-Object Compositions

Flexible Compositional Learning of Structured Visual Concepts

Compositional Generalization in Unsupervised Compositional Representation Learning: A Study on Disentanglement and Emergent Language