Abstract:Computer vision can be considered a highly specialized data collection and data analysis problem. We need to understand the special properties of image data in order to construct statistical models for representing the wide variety of image patterns. One special property of vision that distinguishes itself from other sensory data such as speech data is that distance or scale plays a profound role in image data. More specifically, visual objects and patterns can appear at a wide range of distances or scales, and the same visual pattern appearing at different distances or scales produces different image data with different statistical properties, thus entails different regimes of statistical models. In particular, we show that the entropy rate of the image data changes over the viewing distance (as well as the camera resolution). Moreover, the inferential uncertainty changes with viewing distance too. We call these changes information scaling. From this perspective, we examine both empirically and theoretically two prominent and yet largely isolated research themes in image modeling literature, namely, wavelet sparse coding and Markov random fields. Our results indicate that the two models are appropriate on two different entropy regimes: sparse coding targets the low entropy regime, whereas the random fields are suitable for the high entropy regime. Because of information scaling, both models are necessary for representing and interpreting image intensity patterns in the whole entropy range, and information scaling triggers transitions between these two regimes of models. This motivates us to propose a full-zoom primal sketch model that integrates both sparse coding and Markov random fields. In this model, local image intensity patterns are classified into “sketchable regime” and “non-sketchable regime” by a sketchability criterion. In the sketchable regime, the image data are represented deterministically by highly parametrized sketch primitives. In the non-sketchable regime, the image data are characterized by Markov random fields whose sufficient statistics summarize computational results from failed attempts of sparse coding. The contribution of our work is two folded. First, information scaling provides a dimension to chart the space of natural images. Second, the full-zoom modeling scheme provides a natural integration of sparse coding and Markov random fields, thus enables us to develop a new and richer class of statistical models.

Modeling Complex Motion: Photometric, Geometric, Dynamic, and Topological Aspects

Analysis and Synthesis of Textured Motion: Particle, Wave and Cartoon Sketch

Analysis and synthesis of textured motion: particles and waves.

Learning spatial-temporal models for understanding actions and events in video

Modeling Complex Motion by Tracking and Editing Hidden Markov Graphs.

Modeling Textured Motion : Particle, Wave and Sketch.

Conceptualization and Modeling of Visual Patterns

A hierarchical and contextual model for learning and recognizing highly variant visual categories

Learning explicit and implicit visual manifolds by information projection

Modelling Human Visual Motion Processing with Trainable Motion Energy Sensing and a Self-attention Network

Modeling Visual Patterns by Integrating Descriptive and Generative Methods.

Title From Information Scaling of Natural Images to Regimes of Statistical Models Permalink

Computing three-dimensional scene from a single image by bottom-up/top-down bayesian inference

Interpretable and Scalable Graphical Models for Complex Spatio-temporal Processes

A Mathematical Theory of Textons and Primal Sketch: Integrating Generative and Descriptive Methods

CCPR 2008 Keynote Speech 3 and Keynote Speech 4

Animate Your Motion: Turning Still Images into Dynamic Videos

Motion Mapping Cognition: A Nondecomposable Primary Process in Human Vision

Motion-Based Generator Model: Unsupervised Disentanglement of Appearance, Trackable and Intrackable Motions in Dynamic Patterns.

Learning a Probabilistic Topology Discovering Model for Scene Categorization.

Scalable Scene Modeling from Perspective Imaging: Physics-based Appearance and Geometry Inference