Abstract:Subsampling layers play a crucial role in deep nets by discarding a portion of an activation map to reduce its spatial dimensions. This encourages the deep net to learn higher-level representations. Contrary to this motivation, we hypothesize that the discarded activations are useful and can be incorporated on the fly to improve models' prediction. To validate our hypothesis, we propose a search and aggregate method to find useful activation maps to be used at test time. We applied our approach to the task of image classification and semantic segmentation. Extensive experiments over nine different architectures on multiple datasets show that our method consistently improves model test-time performance, complementing existing test-time augmentation techniques. Our code is available at <a class="link-external link-https" href="https://github.com/ca-joe-yang/discard-in-subsampling" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: the down - sampling layers in deep neural networks inadvertently discard useful activation information during testing, and this information may be helpful for the model's prediction. Specifically, the authors assume that the discarded activation information is useful and can be dynamically incorporated to improve the model's prediction performance. ### Problem Background In computer vision tasks, deep neural networks usually assume that data samples are independently and identically distributed from an unknown distribution. Based on this assumption, the same forward - propagation process should be used in the training and testing stages. However, when this assumption does not hold, changing the process in the testing stage may lead to better performance. For example, Test - Time Augmentation (TTA) improves model performance by leveraging additional data - augmentation prior information. But, apart from data augmentation, are there other ways to improve model performance during testing? ### Core Problem of the Paper The authors re - examine the knowledge built into the deep network architecture, especially the roles of the down - sampling and pooling layers. They observe that models with down - sampling layers do not fully utilize all activation information because some activation information is discarded. Therefore, the core problem proposed is: **Can this discarded activation information be used to enhance the model's prediction performance?** ### Solution To solve this problem, the authors propose a search and aggregation framework: 1. **Search for Useful Activation Information**: By defining an activation search space, each state corresponds to an activation map that can be extracted from the down - sampling layer by choosing different selection indices. Given a computational budget, use a greedy search to find the most promising activation map. 2. **Aggregate Activation Information**: Aggregate these activation maps by weighted averaging to generate the final prediction. To better aggregate activation information, the authors introduce an attention mechanism for learning. ### Experimental Verification The authors conducted extensive experiments on multiple datasets and different network architectures, including image classification and semantic segmentation tasks. The experimental results show that their method consistently improves model performance during testing and is complementary to existing TTA methods, further enhancing performance. ### Main Contributions - Discovered that deep networks with down - sampling layers discard activation information that may be useful for prediction. - Proposed a framework to improve model performance during testing by learning the activation information discarded by standard search and aggregating it using an attention aggregation module. - A large number of experiments on various deep networks verified the effectiveness of the proposed method, especially in image classification and semantic segmentation tasks. In summary, this paper aims to solve the problem of under - utilization of activation information inadvertently discarded by down - sampling layers in deep networks and proposes a new method to improve model performance during testing.

Deep Nets with Subsampling Layers Unwittingly Discard Useful Activations at Test-Time

Deep Neural Network Acceleration with Sparse Prediction Layers

Rethinking Class Activation Maps for Segmentation: Revealing Semantic Information in Shallow Layers by Reducing Noise

Rethinking the Usage of Batch Normalization and Dropout in the Training of Deep Neural Networks

A Progressive Subnetwork Searching Framework for Dynamic Inference

CondenseNet V2: Sparse Feature Reactivation for Deep Networks

A Progressive Sub-Network Searching Framework for Dynamic Inference

Extract More from Less: Efficient Fine-Grained Visual Recognition in Low-Data Regimes

Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptation

DASNet: Dynamic Activation Sparsity for Neural Network Efficiency Improvement

Interpret Neural Networks by Extracting Critical Subnetworks

Learning Activation Functions for Sparse Neural Networks

RedTest: Towards Measuring Redundancy in Deep Neural Networks Effectively

Randomly Initialized Subnetworks with Iterative Weight Recycling

Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures

Learning Layer-Skippable Inference Network

Glance and Focus: a Dynamic Approach to Reducing Spatial Redundancy in Image Classification

Scalable Subsampling Inference for Deep Neural Networks

NetAdaptV2: Efficient Neural Architecture Search with Fast Super-Network Training and Architecture Optimization

NAS-LID: Efficient Neural Architecture Search with Local Intrinsic Dimension

DSNet: A Novel Way to Use Atrous Convolutions in Semantic Segmentation