ViT-Based Multi-task Learning Method for Pulmonary Embolism Detection, Localization, and Type Classification

Ammar Nassr Mohammed,Hulin Kuang,Jianxin Wang
DOI: https://doi.org/10.1007/978-981-97-5692-6_41
2024-01-01
Abstract:Pulmonary Embolism (PE) is a life-threatening disease that causes a significant number of deaths annually. Accurate detection of PE from computed tomography pulmonary angiography (CTPA) scans assists healthcare professionals in making clinical decisions and allocating medical resources. In recent years, researchers have proposed many deep-learning methods for PE detection. However, detecting PE on CTPA scans remains challenging due to the similarity in density between PE blood clots and surrounding lung tissue. This paper proposes a novel ViT-based Multi-Task Learning Method to investigate local and global dependencies for detecting PE and its location and type from CTPA scans. A CNN-Based Feature Extraction Module (CNNFEM) is proposed to generate a cohesive feature representation incorporating local information and detecting PE's presence. CNNFEM comprises EfficientNet-B7 architecture augmented with adaptive average pooling. Then, a Fe-ViT-Based Multi-task Classification Module (FVMTM) is introduced to gather global relationships within the representations, thereby improving PE detection, PE localization (left, right, or central), and PE types (chronic or acute and chronic). FVMTM incorporates the Vision Transformer (ViT) model enhanced with self-attention and max pooling for weight adjustment. The proposed method is evaluated on the Radiological Society of North America (RSNA) dataset. Experimental results show that the proposed method achieves an Area Under the ROC Curve (AUC) of 96.83% for positive PE, 91.72% for negative PE, 93.73% for left PE, 95.64% for central PE, 95.22% for right PE, 71.96% for Chronic PE, 86.91% for Acute and Chronic PE, surpassing the performance of several state-of-the-art methods.
What problem does this paper attempt to address?