SAM-FNet: SAM-Guided Fusion Network for Laryngo-Pharyngeal Tumor Detection

Jia Wei,Yun Li,Meiyu Qiu,Hongyu Chen,Xiaomao Fan,Wenbin Lei
2024-08-15
Abstract:Laryngo-pharyngeal cancer (LPC) is a highly fatal malignant disease affecting the head and neck region. Previous studies on endoscopic tumor detection, particularly those leveraging dual-branch network architectures, have shown significant advancements in tumor detection. These studies highlight the potential of dual-branch networks in improving diagnostic accuracy by effectively integrating global and local (lesion) feature extraction. However, they are still limited in their capabilities to accurately locate the lesion region and capture the discriminative feature information between the global and local branches. To address these issues, we propose a novel SAM-guided fusion network (SAM-FNet), a dual-branch network for laryngo-pharyngeal tumor detection. By leveraging the powerful object segmentation capabilities of the Segment Anything Model (SAM), we introduce the SAM into the SAM-FNet to accurately segment the lesion region. Furthermore, we propose a GAN-like feature optimization (GFO) module to capture the discriminative features between the global and local branches, enhancing the fusion feature complementarity. Additionally, we collect two LPC datasets from the First Affiliated Hospital (FAHSYSU) and the Sixth Affiliated Hospital (SAHSYSU) of Sun Yat-sen University. The FAHSYSU dataset is used as the internal dataset for training the model, while the SAHSYSU dataset is used as the external dataset for evaluating the model's performance. Extensive experiments on both datasets of FAHSYSU and SAHSYSU demonstrate that the SAM-FNet can achieve competitive results, outperforming the state-of-the-art counterparts. The source code of SAM-FNet is available at the URL of <a class="link-external link-https" href="https://github.com/VVJia/SAM-FNet" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address key challenges in the detection of Laryngo-pharyngeal cancer (LPC), particularly in improving the accuracy of tumor detection based on endoscopic images. Specifically, the paper proposes a new model named SAM-FNet, which is a dual-branch network for LPC detection. ### Research Background and Issues - **Clinical Need**: LPC is a highly fatal head and neck malignancy. Early diagnosis is crucial for treatment, potentially raising the 5-year survival rate to 90% while preserving the patient's vocal function. Currently, clinicians primarily rely on biopsy under laryngoscopy as the gold standard for diagnosis, but this method is time-consuming and dependent on the clinician's experience level, which may lead to misdiagnosis or unnecessary repeat biopsies. - **Limitations of Existing Technologies**: In existing studies, deep learning methods for tumor detection are mainly divided into two categories: single-branch networks and dual-branch networks. Single-branch networks focus on extracting global features but overlook important information in local lesion areas; dual-branch networks enhance the model's capability by integrating global and local features, but still have deficiencies in accurately locating lesion areas and fully utilizing the complementarity of global and local features. ### Proposed Method - **SAM-FNet Architecture**: To overcome the above challenges, the researchers propose a new dual-branch network—SAM-FNet. This network includes: - SAM-guided Lesion Localization (SLL) module: Utilizes the powerful object segmentation model Segment Anything Model (SAM) to accurately segment lesion areas. - Global Feature Extractor (GFE) module: Extracts global features from the entire endoscopic image. - Local Feature Extractor (LFE) module: Extracts local features from the lesion area images generated by the SLL module. - GAN-like Feature Optimization (GFO) module: Further captures complementary information between global and local features through an adversarial training strategy. - Classifier: Performs classification predictions on global, local, and fused features. - **Datasets**: The researchers collected two datasets from the First Affiliated Hospital of Sun Yat-sen University (FAHSYSU) and the Sixth Affiliated Hospital (SAHSYSU). The FAHSYSU dataset is used for internal training and validation, while the SAHSYSU dataset is used for external testing to evaluate the model's generalization ability. ### Experimental Results - **Performance Comparison**: On the FAHSYSU dataset, SAM-FNet outperformed other advanced baseline methods such as ResNet, EfficientNet, ViT, RadFormer, and DLGNet in terms of accuracy, precision, recall, and F1 score. Particularly, it achieved a recall rate of 96.27% for malignant tumors, significantly higher than the second-best method. - **External Dataset Performance**: Despite potential inconsistencies in data distribution, SAM-FNet also performed excellently on the SAHSYSU dataset, demonstrating good generalization ability. In summary, SAM-FNet effectively improves the accuracy of LPC detection by introducing SAM for precise lesion localization and combining GAN-like optimization strategies to enhance feature complementarity and discrimination.