Abstract:In this dissertation we present a hierarchical and contextual model for representing image patterns (manmade objects and aerial images) that are highly variant from instance to instance. These types of patterns are difficult to model because objects within the same class may have very different photometric and geometric properties and/or compositions of parts, e.g. teapots may have very different colors, shapes, and locations of their spouts and handles. We hypothesize that these varied visual patterns can be captured by using a novel representation that arranges common primitives of the patterns in a probabilistic hierarchy, thus compactly capturing possible compositional variations, and then enforces contextual constraints on the appearances of the parts, thus modeling the conditional photometric and geometric relationships of the object parts. We combine a Stochastic Context Free Grammar (SCFG), which captures the long-range compositional variations of a pattern, with a Markov Random Field (MRF), which captures the short-range constraints between neighboring pattern primitives, to create our model. We also present a minimax entropy framework for automatically learning which contextual constraints are most relevant for modeling a type of pattern and estimating their parameters. Finally, we present a novel Markov Chain Monte Carlo (MCMC) algorithm called Clustering Cooperative and Competitive Constraints (C 4) for efficiently performing Bayesian inference with our model. C4 is a method for minimizing energy functions defined on graphs that we will use to combine bottom-up and top-down information to find the best interpretation of an image. We show experiments on learning models of a number of manmade object categories and of aerial images and demonstrate that our algorithms automatically learn models that accurately capture the statistical nature of the patterns we are modeling. We also show that our model can be used for inference in new images, allowing it to identify objects in challenging scenarios.

Reconfigurable models for scene recognition

Scene classification using a hybrid generative/discriminative approach

Spotlight the Negatives: A Generalized Discriminative Latent Model

Learning reconfigurable scene representation by tangram model

A Single-Stream Adaptive Scene Layout Modeling Method for Scene Recognition

Learning Generative Models of Scene Features

Latent Model Ensemble with Auto-localization

A Versatile Framework for Multi-scene Person Re-identification

Expanded Parts Model for Semantic Description of Humans in Still Images

Context-LGM: Leveraging Object-Context Relation for Context-Aware Object Recognition

A hierarchical and contextual model for learning and recognizing highly variant visual categories

A Reconfigurable Tangram Model for Scene Representation and Categorization

Region-Based Representations Revisited

Central and peripheral vision for scene recognition: A neurocomputational modeling exploration

Probabilistic Modeling of Semantic Ambiguity for Scene Graph Generation

Scene Recognition by Manifold Regularized Deep Learning Architecture

Learning Compositional Models for Object Categories from Small Sample Sets

SRRM: Semantic Region Relation Model for Indoor Scene Recognition

ROMIR: Robust Multi-View Image Re-Ranking

Exploring 3D-aware Latent Spaces for Efficiently Learning Numerous Scenes