Mitigating Vanishing Activations in Deep CapsNets Using Channel Pruning

Siddharth Sahu,Abdulrahman Altahhan
2024-10-22
Abstract:Capsule Networks outperform Convolutional Neural Networks in learning the part-whole relationships with viewpoint invariance, and the credit goes to their multidimensional capsules. It was assumed that increasing the number of capsule layers in the capsule networks would enhance the model performance. However, recent studies found that Capsule Networks lack scalability due to vanishing activations in the capsules of deeper layers. This paper thoroughly investigates the vanishing activation problem in deep Capsule Networks. To analyze this issue and understand how increasing capsule dimensions can facilitate deeper networks, various Capsule Network models are constructed and evaluated with different numbers of capsules, capsule dimensions, and intermediate layers for this paper. Unlike traditional model pruning, which reduces the number of model parameters and expedites model training, this study uses pruning to mitigate the vanishing activations in the deeper capsule layers. In addition, the backbone network and capsule layers are pruned with different pruning ratios to reduce the number of inactive capsules and achieve better model accuracy than the unpruned models.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the vanishing activation problem encountered by Capsule Networks when increasing the depth. Specifically, when the capsule network becomes deeper, the activation values in the deep - layer capsules tend to zero, resulting in a decline in network performance and poor scalability. To alleviate this problem, the author proposes a method based on channel pruning and combines the correlation coefficient matrix (CCM) loss function to optimize model training. ### Detailed Explanation: 1. **Background and Motivation**: - **Advantages of Capsule Networks**: Capsule networks perform better in learning part - whole relationships and view - invariance compared to convolutional neural networks (CNNs). - **Problem Proposal**: Although capsule networks perform well in shallow structures, when the depth is increased, the network performance declines due to the activation values gradually approaching zero. This is known as the "vanishing activation" problem. 2. **Research Objectives**: - **Alleviating the Vanishing Activation Problem**: Reduce ineffective capsules in deep - layer capsule networks through the pruning method, thereby improving the accuracy and scalability of the model. - **Introducing a New Pruning Strategy**: Use structured pruning methods (such as the CHIP algorithm) and the CCM loss function to ensure that effective feature channels are retained during the pruning process. 3. **Method Overview**: - **Model Architecture**: Construct a deep - layer capsule network with multiple intermediate capsule layers (IntermediateCaps). - **Pruning Technique**: Use the CHIP algorithm to evaluate the importance of each channel and perform pruning according to the importance scores. - **Loss Function**: Combine the capsule margin loss and CCM loss to optimize the model training process. 4. **Experimental Results**: - **Influence of Different Pruning Ratios**: Through experiments, it is found that an appropriate pruning ratio can significantly improve the model's validation accuracy. - **Alleviation of the Vanishing Activation Phenomenon**: As the pruning ratio increases, the activation values of the deep - layer capsules no longer tend to zero, and the model performance is improved. ### Conclusion: This paper successfully alleviates the vanishing activation problem in deep - layer capsule networks by introducing channel pruning and the CCM loss function, improving the accuracy and scalability of the model. This method provides new ideas and tools for future research on deeper - layer capsule networks. ### Formula Summary: - **Dynamic Routing Formula**: \[ b_{ij}^{(r + 1)}=b_{ij}^{(r)}+\hat{u}_{j|i}\cdot\text{squash}(s_j) \] \[ \hat{u}_{j|i}=W_{ij}u_i \] \[ s_j=\sum_i c_{ij}\hat{u}_{j|i} \] \[ \text{squash}(s_j)=\frac{\|s_j\|^2}{1+\|s_j\|^2}\cdot\frac{s_j}{\|s_j\|} \] - **Loss Function**: \[ \text{Loss}_{\text{total}}=\sum_{n}\text{Loss}_{\text{margin}}^n-\alpha\sum_{l}\text{Loss}_{\text{ccm}}^{(l)} \] \[ \text{Loss}_{\text{margin}}^n=T_n\max(0,t_{\text{pos}}-\|v_n\|)^2+\lambda(1 - T_n)\max(0,\|v_n\| - t_{\text{neg}})^2 \] \[ \text{Loss}_{\text{ccm}}^{(l)}=\frac{1}{C(l)\times C(l)}\sum_{i}\sum_{j}\left|\text{corr}\right|