Modeling Complex Motion: Photometric, Geometric, Dynamic, and Topological Aspects

Song-Chun Zhu,Yizhou Wang
2005-01-01
Abstract:As natural scenes contain a huge number of motion patterns generated by various stochastic processes, how to represent and model these diverse visual patterns, and how to learn and compute the visual patterns efficiently are fundamental problems in computer vision. In this thesis, we present a unified theory with a generative statistical learning and inference framework to analyze complex visual patterns in both motion and scale-space. The goal of my research is to compute the semantic content of image sequences and representing them with symbolic graphs using generic visual vocabulary learned from natural images. Based on the generic graph representation, four important modeling issues are addressed, which are photometric, geometric, dynamic and topological issues. These four aspects span sub-dimensions in image space. The unified framework aims to answer questions like "What do we perceive when we look at a video sequence?" It tries to identify the perceptually meaningful basic elements and transitions in image sequences, model the elements' attributes and interactions in time-space and scale-space, and infer the hidden dynamic graph structures. We augment the model complexity progressively to learn various complicated visual patterns. We use three challenging examples to illustrate the learning and inference on these four aspects. Example (1): Textured motion modeling. Example (2): Topological changes in complex motion modeling. Example (3): Perceptual transitions in scale-space. The integrated model is generic. It can be broadly applied to many vision applications, such as tracking, video annotation, motion segmentation, document image analysis, surveillance, and automatic cartoon animation generation. The model learning and inference is achieved by employing Markov chain Monte Carlo (MCMC) sampling method.
What problem does this paper attempt to address?