Lung-YOLO: Multiscale Feature Fusion Attention and Cross-Layer Aggregation for Lung Nodule Detection
Chaosheng Tang,Feifei Zhou,Junding Sun,Yudong Zhang
DOI: https://doi.org/10.1016/j.bspc.2024.106815
2025-01-01
Abstract:Objective: Lung cancer is a significant public health problem worldwide, and its mortality and morbidity rates are among the highest of cancers. At the same time, early diagnosis of nodules can significantly improve the survival rate of patients. Therefore, this paper proposes the Lung-YOLO algorithm for lung CT image detection based on YOLOv6. Methods: First, to enable the network to detect nodules of different sizes and to minimize missed detections, we introduce the Multi-scale Dual-branch Attention (MSDA) mechanism in the feature extraction part of the network. The input features undergo continual dilation convolutions, effectively establish remote dependencies, and the fused multiscale contextual information expands the receptive field while enhancing the model's ability to detect targets of different sizes, which not only contains precise category and location information but also enables the allocation of attentional weights to generate more pixel-level attention. Then, the fused features access the dual-branch attention module to shift the model's attention to the target nodules, and the composed dual-branch structure captures cross-dimensional interactions and realizes inter-dimensional dependencies, effectively improving the detection performance. Second, during the feature transfer process, the original Bidirectional Feature Pyramid structure (RepBiFPAN) suffers from the loss of detailed information such as texture and color, making it challenging to localize target nodules accurately. To address this, we propose the Cross-layer Aggregation Module (CLAM), by cross-layer aggregating the multi-level feature layer of the backbone with the multi-level detection layer of the head, which preserves the multi-level fine-grained information that may be lost during the feature transfer process, which is crucial for the detection of small targets. Finally, the module proposed in this paper can be easily incorporated into any detection framework for plug-and-play. Results: Our method achieves accuracy, precision, recall, and mAP of 97.5 %, 96.5 %, 96.9 %, and 97.9 % on the LUNA16 dataset, and 95.1 %, 94.3 %, 93.4 %, and 95.9 % on the LIDC-IDRI dataset, respectively, surpassing many existing state-of-the-art detection methods. Moreover, the inference speed is 22.8 ms and 28.4 ms per image with 30.6 M parameters, respectively.