Abstract:Pathological cell semantic segmentation is a fundamental technology in computational pathology, essential for applications like cancer diagnosis and effective treatment. Given that multiple cell types exist across various organs, with subtle differences in cell size and shape, multi-organ, multi-class cell segmentation is particularly challenging. Most existing methods employ multi-branch frameworks to enhance feature extraction, but often result in complex architectures. Moreover, reliance on visual information limits performance in multi-class analysis due to intricate textural details. To address these challenges, we propose a Multi-OrgaN multi-Class cell semantic segmentation method with a single brancH (MONCH) that leverages vision-language input. Specifically, we design a hierarchical feature extraction mechanism to provide coarse-to-fine-grained features for segmenting cells of various shapes, including high-frequency, convolutional, and topological features. Inspired by the synergy of textual and multi-grained visual features, we introduce a progressive prompt decoder to harmonize multimodal information, integrating features from fine to coarse granularity for better context capture. Extensive experiments on the PanNuke dataset, which has significant class imbalance and subtle cell size and shape variations, demonstrate that MONCH outperforms state-of-the-art cell segmentation methods and vision-language models. Codes and implementations will be made publicly available.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the semantic segmentation problem of cells in multiple organs and multiple categories, especially its application in pathological image analysis. Specifically, the main challenges faced by researchers include: 1. **Complexity of cells in multiple organs and multiple categories**: Cell types in different organs are diverse, and there are subtle differences in shape and size, which makes the cell segmentation task in multiple organs and multiple categories very challenging. 2. **Limitations of existing methods**: Most existing methods rely on multi - branch frameworks to enhance feature extraction, but these methods often lead to complex architectures and perform poorly in multi - category analysis, especially when dealing with complex texture details. 3. **Data imbalance problem**: Pathological cell segmentation datasets usually have significant class imbalance problems, which place higher requirements on the model's feature extraction ability. 4. **Limitations of visual information**: Relying solely on visual information for multi - category cell segmentation may ignore the complementary role of text information in cell features. To solve the above problems, researchers proposed a method named MONCH (Multi - OrgaN multi - Class cell semantic segmentation with a single brancH), which combines visual - language input and aims to achieve efficient and accurate multi - organ, multi - category cell segmentation through a single - branch network. The following are the specific solutions of MONCH: 1. **Hierarchical feature extraction mechanism**: A coarse - to - fine feature extraction mechanism is designed to provide features of different granularities (including high - frequency, convolutional, and topological features), so as to better segment cells of various shapes. 2. **Progressive hint decoder**: A progressive hint decoder (PPD) is introduced to fuse text and multi - granularity visual features, gradually combining fine - granularity features with coarse - granularity features, thereby better capturing context information. 3. **Multi - granularity visual feature extraction module (MGFE)**: To cope with the shape and size changes of different cell types, a multi - granularity visual feature extraction module is designed, which can extract comprehensive visual features. 4. **High - frequency information extraction module**: For fine - granularity features, a high - pass filtering module is introduced to enhance the texture features of cells. 5. **Topological structure extraction module**: For coarse - granularity features, a topological structure extraction module is introduced to capture the internal structure and shape information of cell distribution. Through these innovations, the experimental results of MONCH on the PanNuke dataset show that it outperforms the existing state - of - the - art methods in the multi - organ, multi - category cell segmentation task and maintains the simplicity and efficiency of the single - branch architecture. ### Formula Summary 1. **Multi - granularity image feature calculation**: \[ F_X=\{F_c, F_m, F_f\} \] where, \[ F_m = GVLM(X, S),\quad F_c = Gds(F_m),\quad F_f = Gus(F_m) \] 2. **High - frequency feature extraction**: \[ F_f'(x, y)=F(F_f(x, y)) \] \[ F_f''(x, y)=H(F_f'(x, y)) \] \[ F_h(x, y)=F^{-1}(F_f''(x, y))+F_f(x, y) \] 3. **Multi - head self - attention mechanism**: \[ Gms(F_h, F_v)=\text{softmax}\left(\frac{g_q(F_h)\cdot g_k(F_v)^\top}{\sqrt{d_k}}\right)\cdot g_v(F_v) \] 4. **Segmentation loss function**: \[ L_{seg}(y, p)=-\frac{1}{N}\sum_{n = 1}^B\sum_{i = 1}^X[y_{in}\log(p_{in})+(1 - y_{in})\log(1 - p_{in})] \]

Progressive Vision-Language Prompt for Multi-Organ Multi-Class Cell Semantic Segmentation with Single Branch

Multi-Scale and Multi-Branch Convolutional Neural Network for Retinal Image Segmentation

Joint Feature Learning for Cell Segmentation Based on Multi-scale Convolutional U-Net.

Multiscale Progressive Text Prompt Network for Medical Image Segmentation

Advanced Multi-Microscopic Views Cell Semi-supervised Segmentation

Multi-Modal Prototypes for Open-World Semantic Segmentation

PFPs: Prompt-guided Flexible Pathological Segmentation for Diverse Potential Outcomes Using Large Vision and Language Models

MVP-SEG: Multi-View Prompt Learning for Open-Vocabulary Semantic Segmentation

MOSMOS: Multi-organ segmentation facilitated by medical report supervision

Look in Different Views: Multi-Scheme Regression Guided Cell Instance Segmentation

Multi-stream Cell Segmentation with Low-level Cues for Multi-modality Images

SegCLIP: Multimodal Visual-Language and Prompt Learning for High-Resolution Remote Sensing Semantic Segmentation

Towards a Visual-Language Foundation Model for Computational Pathology

Visual Prompting Based Incremental Learning for Semantic Segmentation of Multiplex Immuno-Flourescence Microscopy Imagery

Superpixel Semantics Representation and Pre-training for Vision-Language Task

Multi-Bottleneck Progressive Propulsion Network for Medical Image Semantic Segmentation with Integrated Macro-Micro Dual-Stage Feature Enhancement and Refinement

SegAnyPath: A Foundation Model for Multi-resolution Stain-variant and Multi-task Pathology Image Segmentation

The Multi-modality Cell Segmentation Challenge: Towards Universal Solutions

ICPC: Instance-Conditioned Prompting with Contrastive Learning for Semantic Segmentation

CellSAM: Advancing Pathologic Image Cell Segmentation via Asymmetric Large‐Scale Vision Model Feature Distillation Aggregation Network