Self-supervised Semantic Segmentation: Consistency over Transformation

Sanaz Karimijafarbigloo,Reza Azad,Amirhossein Kazerouni,Yury Velichko,Ulas Bagci,Dorit Merhof
2023-09-01
Abstract:Accurate medical image segmentation is of utmost importance for enabling automated clinical decision procedures. However, prevailing supervised deep learning approaches for medical image segmentation encounter significant challenges due to their heavy dependence on extensive labeled training data. To tackle this issue, we propose a novel self-supervised algorithm, \textbf{S$^3$-Net}, which integrates a robust framework based on the proposed Inception Large Kernel Attention (I-LKA) modules. This architectural enhancement makes it possible to comprehensively capture contextual information while preserving local intricacies, thereby enabling precise semantic segmentation. Furthermore, considering that lesions in medical images often exhibit deformations, we leverage deformable convolution as an integral component to effectively capture and delineate lesion deformations for superior object boundary definition. Additionally, our self-supervised strategy emphasizes the acquisition of invariance to affine transformations, which is commonly encountered in medical scenarios. This emphasis on robustness with respect to geometric distortions significantly enhances the model's ability to accurately model and handle such distortions. To enforce spatial consistency and promote the grouping of spatially connected image pixels with similar feature representations, we introduce a spatial consistency loss term. This aids the network in effectively capturing the relationships among neighboring pixels and enhancing the overall segmentation quality. The S$^3$-Net approach iteratively learns pixel-level feature representations for image content clustering in an end-to-end manner. Our experimental results on skin lesion and lung organ segmentation tasks show the superior performance of our method compared to the SOTA approaches. <a class="link-external link-https" href="https://github.com/mindflow-institue/SSCT" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the dependence on a large amount of labeled data in medical image segmentation. Specifically: 1. **Scarcity of labeled data**: In the field of medical image analysis, due to the large volume of images and the need for precise labeling, it is both time - consuming and expensive to manually provide extensive manual - labeled data. In addition, the manual labeling process is prone to human errors. This limits the application effect of supervised learning methods in medical image segmentation tasks. 2. **Limitations of existing methods**: - **Transfer learning**: Although it can be used as a benchmark method, due to the scarcity of labeled data in downstream tasks, the convergence of the network and the ability to learn specific task features are limited, resulting in an unstable model. - **Unsupervised methods**: These methods learn features directly from the data itself, but lack labels or metrics to verify their effectiveness, and their reliability cannot always be guaranteed. - **Semi - supervised methods**: Although they reduce the need for a large amount of manual labeling, they still require a small amount of labeled data, and the labeling process is still time - consuming, expensive, and depends on domain experts. In addition, labeling bias is also a limitation of this method. 3. **Advantages of self - supervised learning**: Self - supervised learning effectively eliminates the need for manual labeling by introducing a series of matching tasks to generate supervision signals from a large amount of unlabeled data. In particular, the Contrastive Learning (CL) method can achieve performance comparable to the state - of - the - art algorithms even with a small amount of labeled data. To solve the above problems, the paper proposes a new self - supervised algorithm named S3 - Net. The main innovations include: - **I - LKA module**: Designed to comprehensively capture context information while retaining local descriptions to achieve accurate semantic segmentation. - **Deformable convolution**: Used to effectively capture and define the deformation of lesion areas and improve the definition accuracy of object boundaries. - **Self - supervised algorithm**: Based on contrastive learning, emphasizing the invariance to affine transformations and enhancing the model's ability to handle geometric distortions. - **Spatial consistency loss**: By modeling edge information, it promotes the grouping of spatially connected pixels and improves the segmentation quality. - **Single - image prediction**: By making predictions based only on a single image, it reduces the impact of dataset bias. Through these innovations, S3 - Net can show better performance than the existing state - of - the - art methods in skin lesion and lung organ segmentation tasks.