Abstract:Transformers, due to their ability to learn long range dependencies, have overcome the shortcomings of convolutional neural networks (CNNs) for global perspective learning. Therefore, they have gained the focus of researchers for several vision related tasks including medical diagnosis. However, their multi-head attention module only captures global level feature representations, which is insufficient for medical images. To address this issue, we propose a Channel Boosted Hybrid Vision Transformer (CB HVT) that uses transfer learning to generate boosted channels and employs both transformers and CNNs to analyse lymphocytes in histopathological images. The proposed CB HVT comprises five modules, including a channel generation module, channel exploitation module, channel merging module, region-aware module, and a detection and segmentation head, which work together to effectively identify lymphocytes. The channel generation module uses the idea of channel boosting through transfer learning to extract diverse channels from different auxiliary learners. In the CB HVT, these boosted channels are first concatenated and ranked using an attention mechanism in the channel exploitation module. A fusion block is then utilized in the channel merging module for a gradual and systematic merging of the diverse boosted channels to improve the network's learning representations. The CB HVT also employs a proposal network in its region aware module and a head to effectively identify objects, even in overlapping regions and with artifacts. We evaluated the proposed CB HVT on two publicly available datasets for lymphocyte assessment in histopathological images. The results show that CB HVT outperformed other state of the art detection models, and has good generalization ability, demonstrating its value as a tool for pathologists.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the challenge of automatically evaluating lymphocytes in histopathological images. Specifically, although existing convolutional neural networks (CNNs) have made remarkable progress in computer vision tasks, they mainly focus on local features of images and cannot well capture information from a global perspective, which is a limitation for complex medical image analysis. On the other hand, Vision Transformers (ViTs) can model long - distance dependencies through the multi - head self - attention mechanism, thus overcoming this defect of CNNs, but ViTs also have some problems when dealing with medical images, such as high computational complexity and poor performance on image - related inductive biases. To address these challenges, the paper proposes a Channel Boosted Hybrid Vision Transformer Network (CB - HVTNet). This network combines the advantages of CNNs and ViTs and effectively identifies lymphocytes in histopathological images through five main components: a channel generation module, a channel utilization module, a channel merging module, a region - aware module, and a detection and segmentation head. Specifically: 1. **Channel Generation Module**: Utilize transfer learning to generate enhanced channels and extract diverse channels from different auxiliary learners. 2. **Channel Utilization Module**: Rank and weight these enhanced channels through an attention mechanism, enabling the network to focus on the most relevant channels. 3. **Channel Merging Module**: Use fusion blocks to gradually and systematically merge diverse enhanced channels to improve the network's learning representation ability. 4. **Region - Aware Module**: Employ a Region Proposal Network (RPN) to identify regions that may contain lymphocytes. 5. **Detection and Segmentation Head**: Generate the final output, including the bounding boxes of possible objects and their corresponding confidence scores, as well as the binary masks of each object, indicating their precise positions in the image. Through these designs, CB - HVTNet aims to provide a more reliable and efficient method for automated lymphocyte evaluation, thereby assisting pathologists in disease diagnosis and treatment planning. The paper evaluated the performance of CB - HVTNet on two public datasets (LYSTO and NuClick), and the results show that this method has good generalization ability and can be an effective tool for lymphocyte evaluation in pathology.

CB-HVTNet: A channel-boosted hybrid vision transformer network for lymphocyte assessment in histopathological images

Channel Boosted CNN-Transformer-based Multi-Level and Multi-Scale Nuclei Segmentation

DCT-HistoTransformer: Efficient Lightweight Vision Transformer with DCT Integration for histopathological image analysis

CViTS-Net: A CNN-ViT Network With Skip Connections for Histopathology Image Classification

AResNet-ViT: A Hybrid CNN-Transformer Network for Benign and Malignant Breast Nodule Classification in Ultrasound Images

TBConvL-Net: A Hybrid Deep Learning Architecture for Robust Medical Image Segmentation

Vision transformer-convolution for breast cancer classification using mammography images: A comparative study

Multi-task approach based on combined CNN-transformer for efficient segmentation and classification of breast tumors in ultrasound images

ViT-CB: Integrating hybrid Vision Transformer and CatBoost to enhanced brain tumor detection with SHAP

Vision Transformer-based Multimodal Feature Fusion Network for Lymphoma Segmentation on PET/CT Images

VITALT: a robust and efficient brain tumor detection system using vision transformer with attention and linear transformation

A LLM-Based Hybrid-Transformer Diagnosis System in Healthcare

DBCvT: Double Branch Convolutional Transformer for Medical Image Classification

Vision Transformer for Efficient Chest X-ray and Gastrointestinal Image Classification

Boosted EfficientNet: Detection of Lymph Node Metastases in Breast Cancer Using Convolutional Neural Network

MC-GAT: multi-layer collaborative generative adversarial transformer for cholangiocarcinoma classification from hyperspectral pathological images

Supervised Contrastive Vision Transformer for Breast Histopathological Image Classification

Pathological Insights: Enhanced Vision Transformers for the Early Detection of Colorectal Cancer

Enhancing medical image analysis: A fusion of fully connected neural network classifier with CNN-VIT for improved retinal disease detection

An Explainable Vision Transformer Model Based White Blood Cells Classification and Localization

HTC-retina: A hybrid retinal diseases classification model using transformer-Convolutional Neural Network from optical coherence tomography images