Pathological Primitive Segmentation Based on Visual Foundation Model with Zero-Shot Mask Generation

Abu Bakor Hayat Arnob,Xiangxue Wang,Yiping Jiao,Xiao Gan,Wenlong Ming,Jun Xu
2024-04-13
Abstract:Medical image processing usually requires a model trained with carefully crafted datasets due to unique image characteristics and domain-specific challenges, especially in pathology. Primitive detection and segmentation in digitized tissue samples are essential for objective and automated diagnosis and prognosis of cancer. SAM (Segment Anything Model) has recently been developed to segment general objects from natural images with high accuracy, but it requires human prompts to generate masks. In this work, we present a novel approach that adapts pre-trained natural image encoders of SAM for detection-based region proposals. Regions proposed by a pre-trained encoder are sent to cascaded feature propagation layers for projection. Then, local semantic and global context is aggregated from multi-scale for bounding box localization and classification. Finally, the SAM decoder uses the identified bounding boxes as essential prompts to generate a comprehensive primitive segmentation map. The entire base framework, SAM, requires no additional training or fine-tuning but could produce an end-to-end result for two fundamental segmentation tasks in pathology. Our method compares with state-of-the-art models in F1 score for nuclei detection and binary/multiclass panoptic(bPQ/mPQ) and mask quality(dice) for segmentation quality on the PanNuke dataset while offering end-to-end efficiency. Our model also achieves remarkable Average Precision (+4.5%) on the secondary dataset (HuBMAP Kidney) compared to Faster RCNN. The code is publicly available at
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to achieve automatic detection and segmentation in pathological image processing, especially in nuclear detection and multi - class pan - segmentation tasks. Specifically, the researchers proposed a method based on visual foundation models (such as SAM, i.e., Segment Anything Model), which can automatically generate bounding boxes as prompts without additional training or fine - tuning, thereby generating high - quality original segmentation maps. This method aims to reduce the workload of pathologists when annotating data, reduce the time required to draw nuclear boundaries, and provide fine - grained segmentation masks during inference. The key points of the paper include: 1. **Innovative feature extraction method**: Extract significant features from each layer of the SAM Transformer encoder to improve overall performance. 2. **Unique architecture design**: Combine the Transformer encoder and the bottom - up convolutional neural network (CNN) decoder to enhance the robustness of the detection and classification process. 3. **End - to - end network**: Use SAM to directly output the bounding boxes and segmentation masks of classified objects, eliminating the need for post - processing steps. Through these methods, the authors of the paper hope to improve the efficiency and accuracy of pathological image processing while reducing the complexity and time cost of annotation.