Progressive Vision-Language Prompt for Multi-Organ Multi-Class Cell Semantic Segmentation with Single Branch

Qing Zhang,Hang Guo,Siyuan Yang,Qingli Li,Yan Wang
2024-12-04
Abstract:Pathological cell semantic segmentation is a fundamental technology in computational pathology, essential for applications like cancer diagnosis and effective treatment. Given that multiple cell types exist across various organs, with subtle differences in cell size and shape, multi-organ, multi-class cell segmentation is particularly challenging. Most existing methods employ multi-branch frameworks to enhance feature extraction, but often result in complex architectures. Moreover, reliance on visual information limits performance in multi-class analysis due to intricate textural details. To address these challenges, we propose a Multi-OrgaN multi-Class cell semantic segmentation method with a single brancH (MONCH) that leverages vision-language input. Specifically, we design a hierarchical feature extraction mechanism to provide coarse-to-fine-grained features for segmenting cells of various shapes, including high-frequency, convolutional, and topological features. Inspired by the synergy of textual and multi-grained visual features, we introduce a progressive prompt decoder to harmonize multimodal information, integrating features from fine to coarse granularity for better context capture. Extensive experiments on the PanNuke dataset, which has significant class imbalance and subtle cell size and shape variations, demonstrate that MONCH outperforms state-of-the-art cell segmentation methods and vision-language models. Codes and implementations will be made publicly available.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the semantic segmentation problem of cells in multiple organs and multiple categories, especially its application in pathological image analysis. Specifically, the main challenges faced by researchers include: 1. **Complexity of cells in multiple organs and multiple categories**: Cell types in different organs are diverse, and there are subtle differences in shape and size, which makes the cell segmentation task in multiple organs and multiple categories very challenging. 2. **Limitations of existing methods**: Most existing methods rely on multi - branch frameworks to enhance feature extraction, but these methods often lead to complex architectures and perform poorly in multi - category analysis, especially when dealing with complex texture details. 3. **Data imbalance problem**: Pathological cell segmentation datasets usually have significant class imbalance problems, which place higher requirements on the model's feature extraction ability. 4. **Limitations of visual information**: Relying solely on visual information for multi - category cell segmentation may ignore the complementary role of text information in cell features. To solve the above problems, researchers proposed a method named MONCH (Multi - OrgaN multi - Class cell semantic segmentation with a single brancH), which combines visual - language input and aims to achieve efficient and accurate multi - organ, multi - category cell segmentation through a single - branch network. The following are the specific solutions of MONCH: 1. **Hierarchical feature extraction mechanism**: A coarse - to - fine feature extraction mechanism is designed to provide features of different granularities (including high - frequency, convolutional, and topological features), so as to better segment cells of various shapes. 2. **Progressive hint decoder**: A progressive hint decoder (PPD) is introduced to fuse text and multi - granularity visual features, gradually combining fine - granularity features with coarse - granularity features, thereby better capturing context information. 3. **Multi - granularity visual feature extraction module (MGFE)**: To cope with the shape and size changes of different cell types, a multi - granularity visual feature extraction module is designed, which can extract comprehensive visual features. 4. **High - frequency information extraction module**: For fine - granularity features, a high - pass filtering module is introduced to enhance the texture features of cells. 5. **Topological structure extraction module**: For coarse - granularity features, a topological structure extraction module is introduced to capture the internal structure and shape information of cell distribution. Through these innovations, the experimental results of MONCH on the PanNuke dataset show that it outperforms the existing state - of - the - art methods in the multi - organ, multi - category cell segmentation task and maintains the simplicity and efficiency of the single - branch architecture. ### Formula Summary 1. **Multi - granularity image feature calculation**: \[ F_X=\{F_c, F_m, F_f\} \] where, \[ F_m = GVLM(X, S),\quad F_c = Gds(F_m),\quad F_f = Gus(F_m) \] 2. **High - frequency feature extraction**: \[ F_f'(x, y)=F(F_f(x, y)) \] \[ F_f''(x, y)=H(F_f'(x, y)) \] \[ F_h(x, y)=F^{-1}(F_f''(x, y))+F_f(x, y) \] 3. **Multi - head self - attention mechanism**: \[ Gms(F_h, F_v)=\text{softmax}\left(\frac{g_q(F_h)\cdot g_k(F_v)^\top}{\sqrt{d_k}}\right)\cdot g_v(F_v) \] 4. **Segmentation loss function**: \[ L_{seg}(y, p)=-\frac{1}{N}\sum_{n = 1}^B\sum_{i = 1}^X[y_{in}\log(p_{in})+(1 - y_{in})\log(1 - p_{in})] \]