Abstract:This dissertation presents a composite template model, named And-Or graph for representing objects with large structural variabilities. Intuitively, an And-node represents a decomposition of certain graphical structures which expands to a set of Or-nodes with associated relations; an Or-node serves as a set of switch variable pointing to alternative And-nodes. A traversal from the root node of the And-Or graph, named the parse graph, produces a configuration of the terminal nodes (sub-templates) under (soft and hard) relations inherited from their ancestor nodes. The And-Or graph representation can generate a large set of constrained configurations with relatively small number of graph nodes, thus account for great structural variations. The And-Or graph model is tested on tasks as modeling and sketching human faces and clothes. A hierarchical-compositional model of human faces, as a three-layer And-Or graph is built. Faces are represented hierarchically: the first layer treats each face as a whole; the second layer refines the local facial parts jointly as a set of individual templates; the third layer further divides face into 16 zones and models detail facial features such as eye corners, marks or wrinkles. Transitions between the layers are realized by measuring the minimum description length (MDL) given the complexity of an input face image. Diverse face representations are formed by drawing from dictionaries of global faces, parts and skin detail features. A sketch captures the most informative part of a face in a much more concise and potentially robust representation. However, generating good facial sketches is extremely challenging because of the rich facial details and large structural variations, especially in the high-resolution images. The representing power of our generative model is demonstrated by reconstructing high-resolution face images and generating the cartoon facial sketches. Our model is useful for a wide variety of applications, including recognition, non-photorealistic rendering, super-resolution, and low-bit rate face coding. Cloth modeling and recognition is an important and challenging problem in both vision and graphics tasks, such as dressed human recognition and tracking, human sketch and portrait. We built a And-Or graph model to represent different clothes configurations, such as T-shirts, jackets, etc. In a supervised learning phase, we ask an artist to draw sketches on a set of dressed people, and we decompose the sketches into categories of cloth and body components: collars, shoulders, cuff, hands, pants, shoes, etc. Each component has a number of distinct sub-templates (sub-graphs). An algorithm which integrates the bottom-up proposals and the top-down information is proposed to infer the composite clothes template efficiently from the image.

A hierarchical and contextual model for learning and recognizing highly variant visual categories

Visual Complexity of Shapes: a Hierarchical Perceptual Learning Model

Learning Compositional Models for Object Categories from Small Sample Sets

Modeling Complex Motion: Photometric, Geometric, Dynamic, and Topological Aspects

A Hierarchical and Contextual Model for Aerial Image Parsing

Conceptualization and Modeling of Visual Patterns

Latent Topic Visual Language Model for Object Categorization.

Fine-Grained Imag E Categorization by Localizing Tiny Object Parts from Unannotated Images

Learning a Probabilistic Topology Discovering Model for Scene Categorization.

Learning and-or templates for object recognition by information projection

Learning explicit and implicit visual manifolds by information projection

Discriminative models for robust image classification

Hierarchical Part Matching for Fine-Grained Visual Categorization

Putting visual object recognition in context

Causal Image Modeling for Efficient Visual Understanding

Which Looks Like Which: Exploring Inter-Class Relationships In Fine-Grained Visual Categorization

A hierarchical compositional model for representation and sketching of high-resolution human images

Compositional diversity in visual concept learning

Learning Unseen Concepts Via Hierarchical Decomposition and Composition

Towards a Unified Compositional Model for Visual Pattern Modeling