DTAN: Diffusion-based Text Attention Network for medical image segmentation

Yiyang Zhao,Jinjiang Li,Lu Ren,Zheng Chen
DOI: https://doi.org/10.1016/j.compbiomed.2023.107728
Abstract:In the current era, diffusion models have emerged as a groundbreaking force in the realm of medical image segmentation. Against this backdrop, we introduce the Diffusion Text-Attention Network (DTAN), a pioneering segmentation framework that amalgamates the principles of text attention with diffusion models to enhance the precision and integrity of medical image segmentation. Our proposed DTAN architecture is designed to steer the segmentation process towards areas of interest by leveraging a text attention mechanism. This mechanism is adept at identifying and zeroing in on the regions of significance, thus improving the accuracy and robustness of the segmentation. In parallel, the integration of a diffusion model serves to diminish the influence of noise and irrelevant background data in medical images, thereby improving the quality of the segmentation results. The diffusion model is instrumental in filtering out extraneous factors, allowing the network to more effectively capture the nuances and characteristics of the target regions, which in turn enhances segmentation precision. We have subjected DTAN to rigorous evaluation across three datasets: Kvasir-Sessile, Kvasir-SEG, and GlaS. Our focus was particularly drawn to the Kvasir-Sessile dataset due to its relevance to clinical applications. When benchmarked against other state-of-the-art methods, our approach demonstrated significant improvements on the Kvasir-Sessile dataset, with a 2.77% increase in mean Intersection over Union (mIoU) and a 3.06% increase in mean Dice Similarity Coefficient (mDSC). These results provide strong evidence of the DTAN's generalizability and robustness, and its distinct advantages in the task of medical image segmentation.
What problem does this paper attempt to address?