Gaussian Transformer and CNN Segmentation Method Based on Contrastive Learning of Boundary

Pan,Yanfeng Li,Houjin Chen,Xuzhi Zhao,Luyifu Chen,Chong Zhang
DOI: https://doi.org/10.1109/icivc58118.2023.10270618
2023-01-01
Abstract:Convolutional neural networks (CNNs) are the most widely used deep learning-based architecture for medical image analysis. However, the limited receptive field of existing CNN-based methods can result in inaccurate segmentation, especially in cases of partial similarity or local environmental changes. To address this issue, transformer architectures have been introduced as an effective solution. However, self-attention mechanism used in the transformer considers all features through weighted averaging, which limits its ability to capture local dependencies. While this may not be a significant issue for internal regions of lesions, accurately delineating small lesion boundaries can be challenging. This is because local correlation is often more critical than distant correlation in medical image segmentation tasks. Furthermore, the precise segmentation of breast tumors in ultrasound images remains challenging due to unclear boundaries. In this study, we propose an angular-margin contrastive learning model that integrates information from gaussian transformer and CNN decoder. Our model uses a shallow CNN encoder, followed by a CNN decoder and a gaussian transformer decoder. Features learned from both branches are then fused for joint prediction. To maintain the U-shape structure between the decoder and encoder, each decoding feature map is skip-connected to its corresponding encoder feature. We introduce a gaussian weighting in traditional multi-head attention to enhance the local correlation modeling ability of self-attention. Additionally, we propose an angular-margin contrastive loss to encourage the model to focus on boundary features and improve segmentation performance. The proposed method is validated on an ABUS dataset, and experimental results demonstrate its state-of-the-art performance, with a Dice similarity coefficient and HD95 of 0.813 and 2.207 mm, respectively. It is worth noting that our method considers not only global semantics but also local correlation and boundary features, thereby improving segmentation accuracy.
What problem does this paper attempt to address?