AResNet-ViT: A Hybrid CNN-Transformer Network for Benign and Malignant Breast Nodule Classification in Ultrasound Images

Xin Zhao,Qianqian Zhu,Jialing Wu
2024-07-28
Abstract:To address the challenges of similarity between lesions and surrounding tissues, overlapping appearances of partially benign and malignant nodules, and difficulty in classification, a deep learning network that integrates CNN and Transformer is proposed for the classification of benign and malignant breast lesions in ultrasound images. This network adopts a dual-branch architecture for local-global feature extraction, making full use of the advantages of CNN in extracting local features and the ability of ViT to extract global features to enhance the network's feature extraction capabilities for breast nodules. The local feature extraction branch employs a residual network with multiple attention-guided modules, which can effectively capture the local details and texture features of breast nodules, enhance sensitivity to subtle changes within the nodules, and thus can aid in accurate classification of their benign and malignancy. The global feature extraction branch utilizes the multi-head self-attention ViT network, which can capture the overall shape, boundary, and relationship with surrounding tissues, and thereby enhancing the understanding and modeling of both nodule and global image features. Experimental results on a public ultrasound breast nodule data set show that the proposed method is better than other comparison networks, This indicates that the fusion of CNN and Transformer networks can effectively improve the performance of the classification model and provide a powerful solution for the benign-malignant classification of ultrasound breast.
Image and Video Processing,Artificial Intelligence,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main objective of this paper is to propose a new deep learning network architecture—AResNet-ViT, for the classification of benign and malignant nodules in breast ultrasound images. Specifically, the study aims to address the following issues: 1. **Improve Classification Accuracy**: The task of classifying benign and malignant nodules in breast ultrasound images faces challenges when nodules and surrounding tissues are similar or when some benign and malignant nodules have overlapping appearances. 2. **Integrate Local and Global Features**: Traditional Convolutional Neural Networks (CNNs) excel at extracting local features but may struggle to effectively capture global feature information; whereas the Transformer architecture excels at capturing global dependencies. Therefore, a method that can combine these two advantages is needed to improve classification performance. 3. **Utilize Attention Mechanisms**: By introducing attention mechanisms to enhance the network's focus on the nodule regions, the sensitivity to internal details of the nodules can be improved, thereby increasing classification accuracy. To achieve the above goals, the paper proposes a dual-branch hybrid CNN-Transformer network (AResNet-ViT), where one branch uses a Residual Network (ResNet) for local feature extraction and optimizes the feature extraction process through various attention mechanisms (such as region attention and channel attention); the other branch uses Vision Transformer (ViT) for global feature extraction. This design allows the network to comprehensively understand image information from both local and global dimensions, thereby improving the accuracy of breast nodule classification. Experimental results show that the AResNet-ViT network achieves high classification performance metrics on the public dataset BUSI, including Accuracy (ACC), True Positive Rate (TPR), True Negative Rate (TNR), and Area Under the Curve (AUC). Additionally, compared to classical classification models and other recently published methods, AResNet-ViT performs excellently on all evaluation metrics, demonstrating its effectiveness in the task of breast nodule classification.