Abstract:Abstract Nowadays, inspired by the great success of Transformers in Natural Language Processing, many applications of Vision Transformers (ViTs) have been investigated in the field of medical image analysis including breast ultrasound (BUS) image segmentation and classification. In this paper, we propose an efficient multi-task framework to segment and classify tumors in BUS images using hybrid convolutional neural networks (CNNs)-ViTs architecture and Multi-Perceptron (MLP)-Mixer. The proposed method uses a two-encoder architecture with EfficientNetV2 backbone and an adapted ViT encoder to extract tumor regions in BUS images. The self-attention (SA) mechanism in the Transformer encoder allows capturing a wide range of high-level and complex features while the EfficientNetV2 encoder preserves local information in image. To fusion the extracted features, a Channel Attention Fusion (CAF) module is introduced. The CAF module selectively emphasizes important features from both encoders, improving the integration of high-level and local information. The resulting feature maps are reconstructed to obtain the segmentation maps using a decoder. Then, our method classifies the segmented tumor regions into benign and malignant using a simple and efficient classifier based on MLP-Mixer, that is applied for the first time, to the best of our knowledge, for the task of lesion classification in BUS images. Experimental results illustrate the outperformance of our framework compared to recent works for the task of segmentation by producing 83.42% in terms of Dice coefficient as well as for the classification with 86% in terms of accuracy.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to address the challenges of tumor segmentation and classification in breast ultrasound (BUS) images. Specifically, the authors propose a multi - task framework based on a CNN - Transformer hybrid architecture and the MLP - Mixer model to improve the accuracy of breast tumor segmentation and classification in ultrasound images. The following are the main problems and background of this study: 1. **Requirements for breast cancer diagnosis**: - Breast cancer is one of the most common cancers among women and is the second - leading cause of cancer - related deaths. Early detection and diagnosis can significantly reduce the mortality rate. - Ultrasound imaging (BUS), as an economical, safe, and portable technology, has important applications in breast cancer screening. 2. **Limitations of current methods**: - The diagnosis of BUS images depends on the operator's experience and skills and is easily affected by noise interference, leading to difficulties in diagnosis. - Reviewing a large number of BUS images requires radiologists and clinicians to spend a great deal of time, increasing the workload. - Although CNN performs well in medical image analysis, its local receptive field limits its ability to capture long - distance dependencies. - Existing methods face challenges in dealing with the high similarity between benign and malignant tumors, irregular tumor boundaries, and changes in lesion size and shape. 3. **Proposed solutions**: - **Multi - task framework**: Combining the advantages of CNN and Transformer, an efficient multi - task framework for breast tumor segmentation and classification is proposed. - **Feature extraction**: Use EfficientNetV2 and an adaptive ViT encoder to extract rich features and context information at different scales. - **Feature fusion**: Design a channel - attention - fusion (CAF) module, which improves the integration of high - and low - level information by selectively emphasizing important features from the two encoders. - **Classification model**: For the first time, apply the MLP - Mixer to the BUS image classification task, and conduct a comparative experiment with the ViT model to show the performance differences between the two models in the classification task. Through these improvements, this study aims to improve the accuracy of breast tumor segmentation and classification, providing more effective tools for computer - aided diagnosis (CAD) systems, thereby helping medical professionals diagnose breast tumors more accurately.

Multi-task approach based on combined CNN-transformer for efficient segmentation and classification of breast tumors in ultrasound images

Breast Ultrasound Tumor Classification Using a Hybrid Multitask CNN-Transformer Network

Mmformer: Multimodal Medical Transformer for Incomplete Multimodal Learning of Brain Tumor Segmentation

A Multi-Task Transformer with Local-Global Feature Interaction and Multiple Tumoral Region Guidance for Breast Cancer Diagnosis

Aggregating efficient transformer and CNN networks using learnable fuzzy measure for breast tumor malignancy prediction in ultrasound images

EfficientUNetViT: Efficient Breast Tumor Segmentation Utilizing UNet Architecture and Pretrained Vision Transformer

A Multi-Task Learning Framework for Automated Segmentation and Classification of Breast Tumors From Ultrasound Images

Multi-task learning for segmentation and classification of breast tumors from ultrasound images

MFMSNet: A Multi-frequency and Multi-scale Interactive CNN-Transformer Hybrid Network for breast ultrasound image segmentation

A dual-stage transformer and MLP-based network for breast ultrasound image segmentation

HAU-Net: Hybrid CNN-transformer for breast ultrasound image segmentation

AResNet-ViT: A Hybrid CNN-Transformer Network for Benign and Malignant Breast Nodule Classification in Ultrasound Images

Vision Transformer for Classification of Breast Ultrasound Images

BUViTNet: Breast Ultrasound Detection via Vision Transformers

Combining the Transformer and Convolution for Effective Brain Tumor Classification Using MRI Images

Vision transformer-convolution for breast cancer classification using mammography images: A comparative study

Breast Tumor Classification in Ultrasound Images by Fusion of Deep Convolutional Neural Network and Shallow LBP Feature

Multi-task Learning for Segmentation and Classification of Tumors in 3D Automated Breast Ultrasound Images

MMMViT: Multiscale multimodal vision transformer for brain tumor segmentation with missing modalities

Improved breast ultrasound tumor classification using dual-input CNN with GAP-guided attention loss

multiPI-TransBTS: A Multi-Path Learning Framework for Brain Tumor Image Segmentation Based on Multi-Physical Information