Abstract:Previous works show global covariance pooling (GCP) has great potential to improve deep architectures especially on visual recognition tasks, where post-normalization of GCP plays a very important role in final performance. Although several post-normalization strategies have been studied, these methods pay more close attention to effect of normalization on covariance representations rather than the whole GCP networks, and their effectiveness requires further understanding. Meanwhile, existing effective post-normalization strategies (e.g., matrix power normalization) usually suffer from high computational complexity (e.g., O(d(3)) for d-dimensional inputs). To handle above issues, this work first analyzes the effect of post-normalization from the perspective of training GCP networks. Particularly, we for the first time show that effective post-normalization can make a good trade-off between representation decorrelation and information preservation for GCP, which are crucial to alleviate over-fitting and increase representation ability of deep GCP networks, respectively. Based on this finding, we can improve existing post-normalization methods with some small modifications, providing further support to our observation. Furthermore, this finding encourages us to propose a novel pre-normalization method for GCP (namely DropCov), which develops an adaptive channel dropout on features right before GCP, aiming to reach trade-off between representation decorrelation and information preservation in a more efficient way. Our DropCov only has a linear complexity of O(d), while being free for inference. Extensive experiments on various benchmarks (i.e., ImageNet-1K, ImageNet-C, ImageNet-A, Stylized-ImageNet, and iNat2017) show our DropCov is superior to the counterparts in terms of efficiency and effectiveness, and provides a simple yet effective method to improve performance of deep architectures involving both deep convolutional neural networks (CNNs) and vision transformers (ViTs).

Understanding Matrix Function Normalizations in Covariance Pooling through the Lens of Riemannian Geometry

Towards a Deeper Understanding of Global Covariance Pooling in Deep Learning: an Optimization Perspective

Deep CNNs Meet Global Covariance Pooling: Better Representation and Generalization

Dynamically Stable Poincaré Embeddings for Neural Manifolds

What Deep CNNs Benefit from Global Covariance Pooling: an Optimization Perspective

Is Second-order Information Helpful for Large-scale Visual Recognition?

Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization.

DropCov: A Simple yet Effective Method for Improving Deep Architectures

An Investigation of the Impact of Normalization Schemes on GCN Modelling

Grassmann Pooling As Compact Homogeneous Bilinear Pooling For Fine-Grained Visual Classification

Get the Best of Both Worlds: Improving Accuracy and Transferability by Grassmann Class Representation

DeepKSPD: Learning Kernel-Matrix-Based SPD Representation for Fine-Grained Image Recognition

A Lie Group Approach to Riemannian Batch Normalization

Covariance descriptors on a Gaussian manifold and their application to image set classification

CCP-GNN: Competitive Covariance Pooling for Improving Graph Neural Networks

Matrix Manifold Neural Networks++

Deep Global Generalized Gaussian Networks

RMLR: Extending Multinomial Logistic Regression into General Geometries

Multi-Objective Matrix Normalization for Fine-grained Visual Recognition

Riemannian Multinomial Logistics Regression for SPD Neural Networks

Riemannian statistics meets random matrix theory: towards learning from high-dimensional covariance matrices