Abstract:Global pooling is one of the most significant operations in many machine learning models and tasks, which works for information fusion and structured data (like sets and graphs) representation. However, without solid mathematical fundamentals, its practical implementations often depend on empirical mechanisms and thus lead to sub-optimal, even unsatisfactory performance. In this work, we develop a novel and generalized global pooling framework through the lens of optimal transport. The proposed framework is interpretable from the perspective of expectation-maximization. Essentially, it aims at learning an optimal transport across sample indices and feature dimensions, making the corresponding pooling operation maximize the conditional expectation of input data. We demonstrate that most existing pooling methods are equivalent to solving a regularized optimal transport (ROT) problem with different specializations, and more sophisticated pooling operations can be implemented by hierarchically solving multiple ROT problems. Making the parameters of the ROT problem learnable, we develop a family of regularized optimal transport pooling (ROTP) layers. We implement the ROTP layers as a new kind of deep implicit layer. Their model architectures correspond to different optimization algorithms. We test our ROTP layers in several representative set-level machine learning scenarios, including multi-instance learning (MIL), graph classification, graph set representation, and image classification. Experimental results show that applying our ROTP layers can reduce the difficulty of the design and selection of global pooling -- our ROTP layers may either imitate some existing global pooling methods or lead to some new pooling layers fitting data better. The code is available at \url{<a class="link-external link-https" href="https://github.com/SDS-Lab/ROT-Pooling" rel="external noopener nofollow">this https URL</a>}.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the deficiencies of existing global pooling methods in theoretical basis and practical applications. Specifically: 1. **Lack of solid mathematical foundation**: Although existing global pooling operations (such as mean - pooling, max - pooling, etc.) are widely used, their theoretical explanations are not sufficient, resulting in the design and selection of these methods relying on empirical mechanisms, which may lead to sub - optimal or even unsatisfactory performance. 2. **Unclear relationships between different pooling methods**: Faced with numerous different pooling methods, the differences and connections between them have not been thoroughly studied. This makes the design and selection of pooling methods difficult and time - consuming, often relying on the trial - and - error method, resulting in poor generalization ability of the model. To solve these problems, the paper proposes a new global pooling framework based on Regularized Optimal Transport (ROT). This framework maximizes the conditional expectation of the input data for the pooling operation by optimizing the joint distribution between sample indices and feature dimensions. The main contributions of the paper include: - **Unifying and generalizing existing pooling methods**: The paper proves that most existing pooling methods can be regarded as different special cases of the ROT problem and proposes a more general pooling framework. - **Introducing learnable parameters**: By making the parameters of the ROT problem learnable, the paper develops a series of Regularized Optimal Transport Pooling (ROTP) layers. These layers can be regarded as new deep implicit layers and can be efficiently solved by optimization algorithms. - **Improving performance and flexibility**: ROTP layers can be implemented under different regularization terms and optimization algorithms, with high flexibility and are suitable for multiple learning scenarios. In addition, by stacking multiple ROTP layers, a hierarchical ROTP (HROTP) module can be constructed to further enhance the ability to process complex data. In summary, the paper aims to simplify the design of global pooling operations and improve their performance in practical tasks by introducing a new pooling framework based on optimal transport theory.

Regularized Optimal Transport Layers for Generalized Global Pooling Operations

LIP: Local Importance-Based Pooling

Wasserstein Pooling for Image Classification

OVPT: Optimal Viewset Pooling Transformer for 3D Object Recognition.

Attentive Pooling with Learnable Norms for Text Representation.

Self-Attentive Pooling for Efficient Deep Learning

FlowPool: Pooling Graph Representations with Wasserstein Gradient Flows

Combining Local and Global: Rich and Robust Feature Pooling for Visual Recognition.

Expectation pooling: an effective and interpretable pooling method for predicting DNA-protein binding

Pool PaRTI: A PageRank-based Pooling Method for Robust Protein Sequence Representation in Deep Learning

Graph explicit pooling for graph-level representation learning

Generalizing Pooling Functions in Convolutional Neural Networks: Mixed, Gated, and Tree

Generalized regular spatial pooling for image classification

Feature Robust Optimal Transport for High-dimensional Data

SP$^2$OT: Semantic-Regularized Progressive Partial Optimal Transport for Imbalanced Clustering

Rethinking Pooling for Multi-Granularity Features in Aerial-View Geo-Localization

R2FP: Rich and Robust Feature Pooling for Mining Visual Data

Deep Generalized Max Pooling

Geometric Pooling: maintaining more useful information

Group Invariant Global Pooling

Linear Optimal Partial Transport Embedding