SemSim: Revisiting Weak-to-Strong Consistency from a Semantic Similarity Perspective for Semi-supervised Medical Image Segmentation

Shiao Xie,Hongyi Wang,Ziwei Niu,Hao Sun,Shuyi Ouyang,Yen-Wei Chen,Lanfen Lin
2024-10-17
Abstract:Semi-supervised learning (SSL) for medical image segmentation is a challenging yet highly practical task, which reduces reliance on large-scale labeled dataset by leveraging unlabeled samples. Among SSL techniques, the weak-to-strong consistency framework, popularized by FixMatch, has emerged as a state-of-the-art method in classification tasks. Notably, such a simple pipeline has also shown competitive performance in medical image segmentation. However, two key limitations still persist, impeding its efficient adaptation: (1) the neglect of contextual dependencies results in inconsistent predictions for similar semantic features, leading to incomplete object segmentation; (2) the lack of exploitation of semantic similarity between labeled and unlabeled data induces considerable class-distribution discrepancy. To address these limitations, we propose a novel semi-supervised framework based on FixMatch, named SemSim, powered by two appealing designs from semantic similarity perspective: (1) rectifying pixel-wise prediction by reasoning about the intra-image pair-wise affinity map, thus integrating contextual dependencies explicitly into the final prediction; (2) bridging labeled and unlabeled data via a feature querying mechanism for compact class representation learning, which fully considers cross-image anatomical similarities. As the reliable semantic similarity extraction depends on robust features, we further introduce an effective spatial-aware fusion module (SFM) to explore distinctive information from multiple scales. Extensive experiments show that SemSim yields consistent improvements over the state-of-the-art methods across three public segmentation benchmarks.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to improve the performance of the model when using a small amount of labeled data and a large amount of unlabeled data in the semi - supervised medical image segmentation task. Specifically, the paper points out two main problems existing in the existing methods (such as FixMatch) when applied to medical image segmentation: 1. **Intra - image problem**: The existing methods only focus on pixel - level prediction, ignoring the intra - image context - dependent relationships, resulting in inconsistent predictions of similar semantic features and thus incomplete object segmentation. 2. **Cross - image problem**: Due to the limited amount of labeled data and insufficient utilization, there are significant differences in the class distributions learned from the labeled data and the unlabeled data. To overcome these problems, the paper proposes a new semi - supervised framework - SemSim, which enhances the performance of the model by introducing **intra - image semantic consistency** and **cross - image semantic consistency**. Specific improvement measures include: - **Intra - image semantic consistency**: Refine the original pixel - level prediction by extracting the feature - level affinity graph to obtain more stable prediction results and explicitly propagate the context - dependent relationships to the model's output. - **Cross - image semantic consistency**: Utilize the reliable class distribution in the labeled data to generate predictions for unlabeled data through the dynamic feature query mechanism, thereby promoting the compactness of the intra - class distribution. In addition, the paper also designs a lightweight spatial - aware fusion module (SFM) to generate more powerful feature representations, so as to better capture the reliable correlations in the data. The experimental results show that SemSim has achieved better results than the existing methods on three publicly available medical image segmentation benchmark datasets.