Deep Nets with Subsampling Layers Unwittingly Discard Useful Activations at Test-Time

Chiao-An Yang,Ziwei Liu,Raymond A. Yeh
2024-10-02
Abstract:Subsampling layers play a crucial role in deep nets by discarding a portion of an activation map to reduce its spatial dimensions. This encourages the deep net to learn higher-level representations. Contrary to this motivation, we hypothesize that the discarded activations are useful and can be incorporated on the fly to improve models' prediction. To validate our hypothesis, we propose a search and aggregate method to find useful activation maps to be used at test time. We applied our approach to the task of image classification and semantic segmentation. Extensive experiments over nine different architectures on multiple datasets show that our method consistently improves model test-time performance, complementing existing test-time augmentation techniques. Our code is available at <a class="link-external link-https" href="https://github.com/ca-joe-yang/discard-in-subsampling" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: the down - sampling layers in deep neural networks inadvertently discard useful activation information during testing, and this information may be helpful for the model's prediction. Specifically, the authors assume that the discarded activation information is useful and can be dynamically incorporated to improve the model's prediction performance. ### Problem Background In computer vision tasks, deep neural networks usually assume that data samples are independently and identically distributed from an unknown distribution. Based on this assumption, the same forward - propagation process should be used in the training and testing stages. However, when this assumption does not hold, changing the process in the testing stage may lead to better performance. For example, Test - Time Augmentation (TTA) improves model performance by leveraging additional data - augmentation prior information. But, apart from data augmentation, are there other ways to improve model performance during testing? ### Core Problem of the Paper The authors re - examine the knowledge built into the deep network architecture, especially the roles of the down - sampling and pooling layers. They observe that models with down - sampling layers do not fully utilize all activation information because some activation information is discarded. Therefore, the core problem proposed is: **Can this discarded activation information be used to enhance the model's prediction performance?** ### Solution To solve this problem, the authors propose a search and aggregation framework: 1. **Search for Useful Activation Information**: By defining an activation search space, each state corresponds to an activation map that can be extracted from the down - sampling layer by choosing different selection indices. Given a computational budget, use a greedy search to find the most promising activation map. 2. **Aggregate Activation Information**: Aggregate these activation maps by weighted averaging to generate the final prediction. To better aggregate activation information, the authors introduce an attention mechanism for learning. ### Experimental Verification The authors conducted extensive experiments on multiple datasets and different network architectures, including image classification and semantic segmentation tasks. The experimental results show that their method consistently improves model performance during testing and is complementary to existing TTA methods, further enhancing performance. ### Main Contributions - Discovered that deep networks with down - sampling layers discard activation information that may be useful for prediction. - Proposed a framework to improve model performance during testing by learning the activation information discarded by standard search and aggregating it using an attention aggregation module. - A large number of experiments on various deep networks verified the effectiveness of the proposed method, especially in image classification and semantic segmentation tasks. In summary, this paper aims to solve the problem of under - utilization of activation information inadvertently discarded by down - sampling layers in deep networks and proposes a new method to improve model performance during testing.