CB-HVTNet: A channel-boosted hybrid vision transformer network for lymphocyte assessment in histopathological images

Momina Liaqat Ali,Zunaira Rauf,Asifullah Khan,Anabia Sohail,Rafi Ullah,Jeonghwan Gwak
DOI: https://doi.org/10.1109/ACCESS.2023.3324383
2023-07-19
Abstract:Transformers, due to their ability to learn long range dependencies, have overcome the shortcomings of convolutional neural networks (CNNs) for global perspective learning. Therefore, they have gained the focus of researchers for several vision related tasks including medical diagnosis. However, their multi-head attention module only captures global level feature representations, which is insufficient for medical images. To address this issue, we propose a Channel Boosted Hybrid Vision Transformer (CB HVT) that uses transfer learning to generate boosted channels and employs both transformers and CNNs to analyse lymphocytes in histopathological images. The proposed CB HVT comprises five modules, including a channel generation module, channel exploitation module, channel merging module, region-aware module, and a detection and segmentation head, which work together to effectively identify lymphocytes. The channel generation module uses the idea of channel boosting through transfer learning to extract diverse channels from different auxiliary learners. In the CB HVT, these boosted channels are first concatenated and ranked using an attention mechanism in the channel exploitation module. A fusion block is then utilized in the channel merging module for a gradual and systematic merging of the diverse boosted channels to improve the network's learning representations. The CB HVT also employs a proposal network in its region aware module and a head to effectively identify objects, even in overlapping regions and with artifacts. We evaluated the proposed CB HVT on two publicly available datasets for lymphocyte assessment in histopathological images. The results show that CB HVT outperformed other state of the art detection models, and has good generalization ability, demonstrating its value as a tool for pathologists.
Image and Video Processing,Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenge of automatically evaluating lymphocytes in histopathological images. Specifically, although existing convolutional neural networks (CNNs) have made remarkable progress in computer vision tasks, they mainly focus on local features of images and cannot well capture information from a global perspective, which is a limitation for complex medical image analysis. On the other hand, Vision Transformers (ViTs) can model long - distance dependencies through the multi - head self - attention mechanism, thus overcoming this defect of CNNs, but ViTs also have some problems when dealing with medical images, such as high computational complexity and poor performance on image - related inductive biases. To address these challenges, the paper proposes a Channel Boosted Hybrid Vision Transformer Network (CB - HVTNet). This network combines the advantages of CNNs and ViTs and effectively identifies lymphocytes in histopathological images through five main components: a channel generation module, a channel utilization module, a channel merging module, a region - aware module, and a detection and segmentation head. Specifically: 1. **Channel Generation Module**: Utilize transfer learning to generate enhanced channels and extract diverse channels from different auxiliary learners. 2. **Channel Utilization Module**: Rank and weight these enhanced channels through an attention mechanism, enabling the network to focus on the most relevant channels. 3. **Channel Merging Module**: Use fusion blocks to gradually and systematically merge diverse enhanced channels to improve the network's learning representation ability. 4. **Region - Aware Module**: Employ a Region Proposal Network (RPN) to identify regions that may contain lymphocytes. 5. **Detection and Segmentation Head**: Generate the final output, including the bounding boxes of possible objects and their corresponding confidence scores, as well as the binary masks of each object, indicating their precise positions in the image. Through these designs, CB - HVTNet aims to provide a more reliable and efficient method for automated lymphocyte evaluation, thereby assisting pathologists in disease diagnosis and treatment planning. The paper evaluated the performance of CB - HVTNet on two public datasets (LYSTO and NuClick), and the results show that this method has good generalization ability and can be an effective tool for lymphocyte evaluation in pathology.