Abstract:The data-intensive nature of supervised classification drives the interest of the researchers towards unsupervised approaches, especially for problems such as medical image segmentation, where labeled data is scarce. Building on the recent advancements of Vision transformers (ViT) in computer vision, we propose an unsupervised segmentation framework using a pre-trained Dino-ViT. In the proposed method, we leverage the inherent graph structure within the image to realize a significant performance gain for segmentation in medical images. For this, we introduce a modularity-based loss function coupled with a Graph Attention Network (GAT) to effectively capture the inherent graph topology within the image. Our method achieves state-of-the-art performance, even significantly surpassing or matching that of existing (semi)supervised technique such as MedSAM which is a Segment Anything Model in medical images. We demonstrate this using two challenging medical image datasets ISIC-2018 and CVC-ColonDB. This work underscores the potential of unsupervised approaches in advancing medical image analysis in scenarios where labeled data is scarce. The github repository of the code is available on [<a class="link-external link-https" href="https://github.com/mudit-adityaja/UnSegMedGAT" rel="external noopener nofollow">this https URL</a>].

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that in the field of medical image segmentation, due to the scarcity of labeled data, it is difficult for supervised learning methods to be widely applied. Specifically, the paper proposes an unsupervised medical image segmentation framework named UnSegMedGAT, aiming to use Graph Attention Networks (GAT) and pre - trained Vision Transformer (ViT) models to achieve efficient image segmentation. ### Main problems 1. **Scarcity of labeled data**: In medical image segmentation tasks, it is very difficult and costly to obtain a large amount of labeled data. This limits the application of supervised learning methods. 2. **Limitations of existing methods**: Although existing supervised and semi - supervised methods perform well on some datasets, they have biases and insufficient generalization ability when dealing with data of different modalities. ### Solutions To solve the above problems, the paper proposes the following innovations: 1. **Unsupervised learning framework**: - Use the pre - trained Dino - ViT model to extract image features. - Optimize the segmentation results by constructing a graph structure and introducing a modularity loss function. 2. **Graph Attention Network (GAT)**: - Use GAT to capture the inherent graph topological structure of the image, thereby improving the segmentation performance. - Introduce the multi - head attention mechanism to enhance the expressive ability of the model. 3. **Novel loss function**: - A loss function defined based on the modularity matrix is used to optimize the clustering effect of the graph. - Combine the regularization term to ensure the stability and generalization ability of the model. 4. **Experimental verification**: - Experiments were carried out on two challenging medical image datasets, ISIC - 2018 and CVC - ColonDB, to verify the effectiveness of the proposed method. - The experimental results show that UnSegMedGAT is significantly superior to existing unsupervised and semi - supervised methods in multiple evaluation indicators. ### Formula summary - **Adjacency matrix \( A \)**: \[ A = f f^T\cdot\left( f f^T > \tau \right)\in\mathbb{R}^{\frac{st}{p^2}\times\frac{st}{p^2}} \] where \( \tau \) is a user - defined threshold parameter. - **Attention coefficient \( \alpha_{ij} \)**: \[ \alpha_{ij}=\frac{\exp(\text{LeakyReLU}(a^{\top}[W h_i \| W h_j]))}{\sum_{k\in N(i)}\exp(\text{LeakyReLU}(a^{\top}[W h_i \| W h_k]))} \] - **Node output feature vector \( h'_i \)**: \[ h'_i = \sigma\left( \sum_{j\in N(i)}\alpha_{ij}W h_j \right) \] - **Multi - head attention mechanism output \( H'_i \)**: \[ H'_i=\bigg\|_{z = 1}^Z\sigma\left( \sum_{j\in N(i)}\alpha^z_{ij}W^z h_j \right) \] - **Loss function \( L \)**: \[ L = -\frac{1}{2m}\text{Tr}(C^{\top}B C)+\sqrt{\frac{k}{n}}\left\| C_i \right\|_F - 1 \] where \( B = A-\frac{d d^{\top}}{2m}\), \( m = |E|\), \( C \) is the clustering assignment matrix. Through these innovations, the paper demonstrates the unsupervised method in medical image segmentation.

UnSegMedGAT: Unsupervised Medical Image Segmentation using Graph Attention Networks Clustering

Mixed Transformer U-Net for Medical Image Segmentation

UnSegGNet: Unsupervised Image Segmentation using Graph Neural Networks

UnSeGArmaNet: Unsupervised Image Segmentation using Graph Neural Networks with Convolutional ARMA Filters

Unsupervised Medical Image Segmentation with Adversarial Networks: From Edge Diagrams to Segmentation Maps

Many Birds, One Stone: Medical Image Segmentation with Multiple Partially Labeled Datasets

Generative Adversarial Semi-Supervised Network for Medical Image Segmentation

TGDAUNet: Transformer and GCNN based dual-branch attention UNet for medical image segmentation

DSGA-Net: Deeply Separable Gated Transformer and Attention Strategy for Medical Image Segmentation Network

MSGAT: Multi-scale gated axial reverse attention transformer network for medical image segmentation

UniverSeg: Universal Medical Image Segmentation

ViG-UNet: Vision Graph Neural Networks for Medical Image Segmentation

CiT-Net: Convolutional Neural Networks Hand in Hand with Vision Transformers for Medical Image Segmentation

Medical Transformer: Gated Axial-Attention for Medical Image Segmentation

Few Shot Medical Image Segmentation with Cross Attention Transformer

Boundary Aware U-Net for Medical Image Segmentation

One-shot Localization and Segmentation of Medical Images with Foundation Models

AttENT: Domain-Adaptive Medical Image Segmentation Via Attention-Aware Translation and Adversarial Entropy Minimization

AgileFormer: Spatially Agile Transformer UNet for Medical Image Segmentation

Modality-agnostic Domain Generalizable Medical Image Segmentation by Multi-Frequency in Multi-Scale Attention

GL-Segnet: Global-Local representation learning net for medical image segmentation