VSNet: classification of pulmonary nodules in 3D using vision transformer and sequence spatial attention mechanism

Dongfang Tang,Ting Xiao,Fan Yang,Conghao Zhang,Zhe Wang,Wen Gao
DOI: https://doi.org/10.1007/s11042-024-19475-3
IF: 2.577
2024-06-13
Multimedia Tools and Applications
Abstract:Accurate classification of benign and malignant nodules in Computed Tomography (CT) scans is crucial for the early detection of lung cancer and Computer-Aided Diagnosis (CAD) systems. Despite significant advancements, challenges such as the interpretability of the reasoning process and the lack of fine-grained representations persist. To address these challenges, we propose a 3D VSNet. Our approach incorporates a Sequence Spatial Attention Module (SSAM) that automatically locates the sequence of the encoder's output and the receptive field region required, achieving the acquisition of key features of pulmonary nodules. We leverage the characteristics of shallow and deep features based on 3D Vision Transformer (ViT) and Convolutional Neural Networks (CNNs) to obtain fine-grained representations and improve nodule classification. Additionally, we implement a new training strategy using Supervised Contrastive (SC) loss and Proxy-Anchor (PA) loss to optimize the embedding feature of similar samples and the direct cosine distance, and enhance the convergence speed of tuple sampling. Our experiments on the LIDC-IDRI dataset demonstrate the effectiveness of our proposed technique, achieving an accuracy of 90.28% and a precision of 91.63%. Furthermore, we conduct ablation experiments to analyze the contribution and influence of each component of our method.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering
What problem does this paper attempt to address?