InceptionCapsule: Inception-Resnet and CapsuleNet with self-attention for medical image Classification

Elham Sadeghnezhad,Sajjad Salem
2024-02-04
Abstract:Initial weighting is significant in deep neural networks because the random selection of weights produces different outputs and increases the probability of overfitting and underfitting. On the other hand, vector-based approaches to extract vector features need rich vectors for more accurate classification. The InceptionCapsule approach is presented to alleviate these two problems. This approach uses transfer learning and the Inception-ResNet model to avoid random selection of weights, which takes initial weights from ImageNet. It also uses the output of Inception middle layers to generate rich vectors. Extracted vectors are given to a capsule network for learning, which is equipped with an attention technique. Kvasir data and BUSI with the GT dataset were used to evaluate this approach. This model was able to achieve 97.62 accuracies in 5-class classification and also achieved 94.30 accuracies in 8-class classification on Kvasir. In the BUSI with GT dataset, the proposed approach achieved accuracy=98.88, Precision=95.34, and F1-score=93.74, which are acceptable results compared to other approaches in the literature.
Image and Video Processing,Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve The paper primarily aims to address two key issues in medical image classification: 1. **Initial Weight Selection Problem**: In deep neural networks, randomly selecting weights can lead to different outputs and increase the probability of overfitting or underfitting. 2. **Rich Vector Selection Problem**: Vector-based methods require rich vector features for more accurate classification. To tackle these issues, the authors propose a method called "InceptionCapsule." This method combines transfer learning and the Inception-ResNet model to avoid the problem of randomly selecting weights and obtains initial weights from ImageNet. Additionally, the method utilizes the output of the intermediate layers of Inception to generate rich vector features. These extracted vectors are fed into a capsule network (CapsuleNet) with a self-attention mechanism for learning. ### Main Contributions - Using transfer learning and initial weights from ImageNet to learn initial weights. - Utilizing Inception to extract rich vector features for the capsule network. - Using a self-attention mechanism to learn the best features. With this approach, the paper achieves a 97.62% five-class classification accuracy and a 94.30% eight-class classification accuracy on the Kvasir dataset. On the BUSI with GT dataset, the method achieves a 98.88% accuracy, 95.34% precision, and 93.74% F1 score, which significantly outperforms other methods in the literature.