SENetV2: Aggregated dense layer for channelwise and global representations

Mahendran Narayanan
2023-11-17
Abstract:Convolutional Neural Networks (CNNs) have revolutionized image classification by extracting spatial features and enabling state-of-the-art accuracy in vision-based tasks. The squeeze and excitation network proposed module gathers channelwise representations of the input. Multilayer perceptrons (MLP) learn global representation from the data and in most image classification models used to learn extracted features of the image. In this paper, we introduce a novel aggregated multilayer perceptron, a multi-branch dense layer, within the Squeeze excitation residual module designed to surpass the performance of existing architectures. Our approach leverages a combination of squeeze excitation network module with dense layers. This fusion enhances the network's ability to capture channel-wise patterns and have global knowledge, leading to a better feature representation. This proposed model has a negligible increase in parameters when compared to SENet. We conduct extensive experiments on benchmark datasets to validate the model and compare them with established architectures. Experimental results demonstrate a remarkable increase in the classification accuracy of the proposed model.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The paper introduces an advanced version of the Squeeze and Excitation (SE) network, termed SENetV2, which aims to improve the representational power and classification accuracy of convolutional neural networks (CNNs). The primary problem addressed by this research is the enhancement of feature representation in CNNs, particularly in the context of image classification tasks. ### Key Objectives 1. **Enhancing Channel-Wise Representations**: The authors aim to improve the channel-wise feature representation by combining the SE module with aggregated dense layers, which are multi-branch dense layers. This combination is expected to provide a better balance between channel-wise and global representations. 2. **Reducing Theoretical Complexity**: The proposed approach seeks to reduce the theoretical complexity of the network while maintaining or improving performance. This is achieved through the use of aggregated modules, inspired by the Inception architecture, which helps in capturing spatial representations more efficiently. 3. **Improving Classification Accuracy**: The ultimate goal is to increase the classification accuracy of the network on benchmark datasets. The authors conduct experiments to validate the effectiveness of the proposed model against established architectures. ### Contributions 1. **Integration of Aggregated Modules**: The paper acknowledges the benefits of aggregated modules and their robust representational capabilities, drawing inspiration from Ince