Abstract:Vision can be considered a highly specialized data collection and analysis problem. We need to understand the special properties of natural image data in order to construct statistical models and develop statistical methods for representing and recognizing the wide variety of natural image patterns. One fundamental property of natural image data that distinguishes vision from other sensory tasks such as speech recognition is that scale plays a profound role in image formation and interpretation. Specifically, visual objects can appear at a wide range of scales in the images due to the change of viewing distance as well as camera resolution. The same objects appearing at different scales produce different image data with different statistical properties. In particular, we show that the entropy rate of the image data changes over scale. Moreover, the inferential uncertainty changes over scale too. We call these changes information scaling. We then examine both empirically and theoretically two prominent and yet largely isolated classes of image models, namely, wavelet sparse coding models and Markov random field models. Our results indicate that the two classes of models are appropriate for two different entropy regimes: sparse coding targets low entropy regimes, whereas Markov random fields are appropriate for high entropy regimes. Because information scaling connects different entropy regimes, both sparse coding and Markov random fields are necessary for representing natural image data, and information scaling triggers transitions between these two regimes. This motivates us to propose a modeling scheme that embraces both regimes of models in a common framework. The contribution of our work is two-fold. First, the study of information scaling provides a unifying perspective for the rich variety of natural image patterns. Second, the modeling scheme that we develop provides a natural integration of different regimes of image models.

Information-Theoretic Structure For Visual Signal Understanding

Visual words assignment via information-theoretic manifold embedding.

Structured Label Inference for Visual Understanding.

Visual Words Assignment on A Graph Via Minimal Mutual Information Loss

Direct Alignment with Generalized Correspondences: A Unified Framework for Structure-Based Visual Pose Estimation.

Learning Structured Semantic Embeddings for Visual Recognition

Unifying Discriminative Visual Codebook Generation with Classifier Training for Object Category Recognition

Visual Feature Coding for Image Classification Integrating Dictionary Structure

Learning explicit and implicit visual manifolds by information projection

Multi-Level structured image coding on high-dimensional image representation

Information-theoretic Dictionary Learning for Image Classification

High-Order Topology Modeling of Visual Words for Image Classification

Deep and Structured Robust Information Theoretic Learning for Image Analysis

Structurally Enhanced Incremental Neural Learning for Image Classification with Subgraph Extraction

More About Covariance Descriptors for Image Set Coding: Log-Euclidean Framework based Kernel Matrix Representation

Inductive Structure Consistent Hashing via Flexible Semantic Calibration

Building Descriptive and Discriminative Visual Codebook for Large-Scale Image Applications.

Visual information quantification for object recognition and retrieval

From Information Scaling of Natural Images to Regimes of Statistical Models

Visual word coding based on difference maximization.

Learning Dictionary on Manifolds for Image Classification