Abstract:By characterizing each image set as a nonsingular covariance matrix on the symmetric positive definite (SPD) manifold, the approaches of visual content classification with image sets have made impressive progress. However, the key challenge of unhelpfully large intraclass variability and interclass similarity of representations remains open to date. Although, several recent studies have mitigated the two problems by jointly learning the embedding mapping and the similarity metric on the original SPD manifold, their inherent shallow and linear feature transformation mechanism are not powerful enough to capture useful geometric features, especially in complex scenarios. To this end, this article explores a novel approach, termed SPD manifold deep metric learning (SMDML), for image set classification. Specifically, SMDML first selects a prevailing SPD manifold neural network (SPDNet) as the backbone (encoder) to derive an SPD matrix nonlinear representation. To counteract the degradation of structural information during multistage feature embedding, we construct a Riemannian decoder at the end of the encoder, trained by a reconstruction error term (RT), to induce the generated low-dimensional feature manifold of the hidden layer to capture the pivotal information about the visual data describing the imaged scene. We demonstrate through theory and experiments that it is feasible to replace the Riemannian metric with Euclidean distance in RT. Then, the ReCov layer is introduced into the established Riemannian network to regularize the local statistical information within each input feature matrix, which enhances the effectiveness of the learning process. The theoretical analysis of the activation function used in the ReCov layer in terms of continuity and conditions for generating positive definite matrices is beneficial for network design. Inspired by the fact that the single cross-entropy loss used for training is unable to effectively parse the geometric distribution of the deep representations, we finally endow the suggested model with a novel metric learning regularization term. By explicitly incorporating the encoding and processing of the data variations into the network learning process, this term can not only derive a powerful Riemannian representation but also train an effective classifier. The experimental results show the superiority of the proposed approach on three typical visual classification tasks.

Visual words assignment via information-theoretic manifold embedding.

Visual Words Assignment on A Graph Via Minimal Mutual Information Loss

Learning Visually Aligned Semantic Graph for Cross-Modal Manifold Matching.

A Novel Image Classification Method Based on Manifold Learning and Gaussian Mixture Model

Nonlinear Discrete Cross-Modal Hashing for Visual-Textual Data

Learning Exemplar-Represented Manifolds in Latent Space for Classification.

A Regularized Approach for Geodesic-Based Semisupervised Multimanifold Learning

Learning Dictionary on Manifolds for Image Classification

Visual word coding based on difference maximization.

Large Visual Words For Large Scale Image Classification

Manifold Optimal Experimental Design Via Dependence Maximization for Active Learning

Unifying Discriminative Visual Codebook Generation with Classifier Training for Object Category Recognition

Hashing on nonlinear manifolds.

Discriminative Sparse Coding on Multi-Manifold for Data Representation and Classification

Learning explicit and implicit visual manifolds by information projection

Towards Semantic Embedding In Visual Vocabulary

Building Descriptive and Discriminative Visual Codebook for Large-Scale Image Applications.

SPD Manifold Deep Metric Learning for Image Set Classification

Active learning on manifolds

Extrinsic Methods for Coding and Dictionary Learning on Grassmann Manifolds

Covariance descriptors on a Gaussian manifold and their application to image set classification