UnSegMedGAT: Unsupervised Medical Image Segmentation using Graph Attention Networks Clustering

A. Mudit Adityaja,Saurabh J. Shigwan,Nitin Kumar
DOI: https://doi.org/10.48550/arXiv.2411.01966
2024-11-04
Abstract:The data-intensive nature of supervised classification drives the interest of the researchers towards unsupervised approaches, especially for problems such as medical image segmentation, where labeled data is scarce. Building on the recent advancements of Vision transformers (ViT) in computer vision, we propose an unsupervised segmentation framework using a pre-trained Dino-ViT. In the proposed method, we leverage the inherent graph structure within the image to realize a significant performance gain for segmentation in medical images. For this, we introduce a modularity-based loss function coupled with a Graph Attention Network (GAT) to effectively capture the inherent graph topology within the image. Our method achieves state-of-the-art performance, even significantly surpassing or matching that of existing (semi)supervised technique such as MedSAM which is a Segment Anything Model in medical images. We demonstrate this using two challenging medical image datasets ISIC-2018 and CVC-ColonDB. This work underscores the potential of unsupervised approaches in advancing medical image analysis in scenarios where labeled data is scarce. The github repository of the code is available on [<a class="link-external link-https" href="https://github.com/mudit-adityaja/UnSegMedGAT" rel="external noopener nofollow">this https URL</a>].
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in the field of medical image segmentation, due to the scarcity of labeled data, it is difficult for supervised learning methods to be widely applied. Specifically, the paper proposes an unsupervised medical image segmentation framework named UnSegMedGAT, aiming to use Graph Attention Networks (GAT) and pre - trained Vision Transformer (ViT) models to achieve efficient image segmentation. ### Main problems 1. **Scarcity of labeled data**: In medical image segmentation tasks, it is very difficult and costly to obtain a large amount of labeled data. This limits the application of supervised learning methods. 2. **Limitations of existing methods**: Although existing supervised and semi - supervised methods perform well on some datasets, they have biases and insufficient generalization ability when dealing with data of different modalities. ### Solutions To solve the above problems, the paper proposes the following innovations: 1. **Unsupervised learning framework**: - Use the pre - trained Dino - ViT model to extract image features. - Optimize the segmentation results by constructing a graph structure and introducing a modularity loss function. 2. **Graph Attention Network (GAT)**: - Use GAT to capture the inherent graph topological structure of the image, thereby improving the segmentation performance. - Introduce the multi - head attention mechanism to enhance the expressive ability of the model. 3. **Novel loss function**: - A loss function defined based on the modularity matrix is used to optimize the clustering effect of the graph. - Combine the regularization term to ensure the stability and generalization ability of the model. 4. **Experimental verification**: - Experiments were carried out on two challenging medical image datasets, ISIC - 2018 and CVC - ColonDB, to verify the effectiveness of the proposed method. - The experimental results show that UnSegMedGAT is significantly superior to existing unsupervised and semi - supervised methods in multiple evaluation indicators. ### Formula summary - **Adjacency matrix \( A \)**: \[ A = f f^T\cdot\left( f f^T > \tau \right)\in\mathbb{R}^{\frac{st}{p^2}\times\frac{st}{p^2}} \] where \( \tau \) is a user - defined threshold parameter. - **Attention coefficient \( \alpha_{ij} \)**: \[ \alpha_{ij}=\frac{\exp(\text{LeakyReLU}(a^{\top}[W h_i \| W h_j]))}{\sum_{k\in N(i)}\exp(\text{LeakyReLU}(a^{\top}[W h_i \| W h_k]))} \] - **Node output feature vector \( h'_i \)**: \[ h'_i = \sigma\left( \sum_{j\in N(i)}\alpha_{ij}W h_j \right) \] - **Multi - head attention mechanism output \( H'_i \)**: \[ H'_i=\bigg\|_{z = 1}^Z\sigma\left( \sum_{j\in N(i)}\alpha^z_{ij}W^z h_j \right) \] - **Loss function \( L \)**: \[ L = -\frac{1}{2m}\text{Tr}(C^{\top}B C)+\sqrt{\frac{k}{n}}\left\| C_i \right\|_F - 1 \] where \( B = A-\frac{d d^{\top}}{2m}\), \( m = |E|\), \( C \) is the clustering assignment matrix. Through these innovations, the paper demonstrates the unsupervised method in medical image segmentation.