Learning and-or templates for object recognition by information projection

Song-Chun Zhu,Zhangzhang Si
2011-01-01
Abstract:Finding statistical models for the bewildering varieties of visual patterns in natural scenes such as object patterns and texture patterns is at the core of understanding the mystery of vision. The generative image models, which are automatically learned from observed image examples, help us understand the underlying structure of the high dimensional image space. On the other hand, they provide powerful schemes for machine vision tasks such as object recognition and detection. In this work, I mainly focus on learning probabilistic generative image models as hierarchical AND-OR Templates (AOT). More specifically, the proposed AND-OR Templates have the following characteristics which are advantageous in representing visual objects: (1) hierarchical composition (AND). An object is usually composed by several constituent parts (e.g. a person is composed of head, body, arms and feet) that are relatively independent of each other. The parts can be further decomposed into smaller parts. (2) Hierarchical coarse-to-fine deformation (Continuous/geometric OR). For example, a person can form a complicated pose and its articulation can be represented as movements of larger body parts at a coarse level, together with movements of sub-parts within each body parts, and so on. (3) Multiple ways of composition (Discrete/structural OR). For example, a person may have small eyes or large eyes.
What problem does this paper attempt to address?