Abstract:This dissertation presents a composite template model, named And-Or graph for representing objects with large structural variabilities. Intuitively, an And-node represents a decomposition of certain graphical structures which expands to a set of Or-nodes with associated relations; an Or-node serves as a set of switch variable pointing to alternative And-nodes. A traversal from the root node of the And-Or graph, named the parse graph, produces a configuration of the terminal nodes (sub-templates) under (soft and hard) relations inherited from their ancestor nodes. The And-Or graph representation can generate a large set of constrained configurations with relatively small number of graph nodes, thus account for great structural variations. The And-Or graph model is tested on tasks as modeling and sketching human faces and clothes. A hierarchical-compositional model of human faces, as a three-layer And-Or graph is built. Faces are represented hierarchically: the first layer treats each face as a whole; the second layer refines the local facial parts jointly as a set of individual templates; the third layer further divides face into 16 zones and models detail facial features such as eye corners, marks or wrinkles. Transitions between the layers are realized by measuring the minimum description length (MDL) given the complexity of an input face image. Diverse face representations are formed by drawing from dictionaries of global faces, parts and skin detail features. A sketch captures the most informative part of a face in a much more concise and potentially robust representation. However, generating good facial sketches is extremely challenging because of the rich facial details and large structural variations, especially in the high-resolution images. The representing power of our generative model is demonstrated by reconstructing high-resolution face images and generating the cartoon facial sketches. Our model is useful for a wide variety of applications, including recognition, non-photorealistic rendering, super-resolution, and low-bit rate face coding. Cloth modeling and recognition is an important and challenging problem in both vision and graphics tasks, such as dressed human recognition and tracking, human sketch and portrait. We built a And-Or graph model to represent different clothes configurations, such as T-shirts, jackets, etc. In a supervised learning phase, we ask an artist to draw sketches on a set of dressed people, and we decompose the sketches into categories of cloth and body components: collars, shoulders, cuff, hands, pants, shoes, etc. Each component has a number of distinct sub-templates (sub-graphs). An algorithm which integrates the bottom-up proposals and the top-down information is proposed to infer the composite clothes template efficiently from the image.

Learning and-or templates for object recognition by information projection

Learning And-Or Templates for Object Recognition and Detection

Unsupervised Learning of Stochastic AND-OR Templates for Object Modeling.

Learning Mixed Templates for Object Recognition

Learning AND-OR Templates for Professional Photograph Parsing and Guidance

Learning A Probabilistic Model Mixing 3d And 2d Primitives For View Invariant Object Recognition

Object Category Recognition Using Generative Template Boosting.

A hierarchical compositional model for representation and sketching of high-resolution human images

Learning 3D Object Templates by Quantizing Geometry and Appearance Spaces

A hierarchical and contextual model for learning and recognizing highly variant visual categories

Learning and Representing Object Shape Through an Array of Orientation Columns

Dynamical And-Or Graph Learning for Object Shape Modeling and Detection

Discriminatively Trained And-Or Tree Models for Object Detection

Learning explicit and implicit visual manifolds by information projection

Towards a Unified Compositional Model for Visual Pattern Modeling

Learning to Infer Generative Template Programs for Visual Concepts

Conceptualization and Modeling of Visual Patterns

Modeling Complex Motion: Photometric, Geometric, Dynamic, and Topological Aspects

Learning a hierarchical deformable template for rapid deformable object parsing

An Object Representation Model Based on the Mechanism of Visual Perception.