Abstract:Pulmonary Embolism (PE) represents a thrombus ("blood clot"), usually originating from a lower extremity vein, that travels to the blood vessels in the lung, causing vascular obstruction and in some patients death. This disorder is commonly diagnosed using Computed Tomography Pulmonary Angiography (CTPA). Deep learning holds great promise for the Computer-aided Diagnosis (CAD) of PE. However, numerous deep learning methods, such as Convolutional Neural Networks (CNN) and Transformer-based models, exist for a given task, causing great confusion regarding the development of CAD systems for PE. To address this confusion, we present a comprehensive analysis of competing deep learning methods applicable to PE diagnosis based on four datasets. First, we use the RSNA PE dataset, which includes (weak) slice-level and exam-level labels, for PE classification and diagnosis, respectively. At the slice level, we compare CNNs with the Vision Transformer (ViT) and the Swin Transformer. We also investigate the impact of self-supervised versus (fully) supervised ImageNet pre-training, and transfer learning over training models from scratch. Additionally, at the exam level, we compare sequence model learning with our proposed transformer-based architecture, Embedding-based ViT (E-ViT). For the second and third datasets, we utilize the CAD-PE Challenge Dataset and Ferdowsi University of Mashad's PE Dataset, where we convert (strong) clot-level masks into slice-level annotations to evaluate the optimal CNN model for slice-level PE classification. Finally, we use our in-house PE-CAD dataset, which contains (strong) clot-level masks. Here, we investigate the impact of our vessel-oriented image representations and self-supervised pre-training on PE false positive reduction at the clot level across image dimensions (2D, 2.5D, and 3D). Our experiments show that (1) transfer learning boosts performance despite differences between photographic images and CTPA scans; (2) self-supervised pre-training can surpass (fully) supervised pre-training; (3) transformer-based models demonstrate comparable performance but slower convergence compared with CNNs for slice-level PE classification; (4) model trained on the RSNA PE dataset demonstrates promising performance when tested on unseen datasets for slice-level PE classification; (5) our E-ViT framework excels in handling variable numbers of slices and outperforms sequence model learning for exam-level diagnosis; and (6) vessel-oriented image representation and self-supervised pre-training both enhance performance for PE false positive reduction across image dimensions. Our optimal approach surpasses state-of-the-art results on the RSNA PE dataset, enhancing AUC by 0.62% (slice-level) and 2.22% (exam-level). On our in-house PE-CAD dataset, 3D vessel-oriented images improve performance from 80.07% to 91.35%, a remarkable 11% gain. Codes are available at GitHub.com/JLiangLab/CAD_PE.

Modifying boosted trees to improve performance on task 1 of the 2006 KDD challenge cup

To Boost or Not to Boost? On the Limits of Boosted Trees for Object Detection

Seeking an Optimal Approach for Computer-Aided Pulmonary Embolism Detection

Acute coronary syndrome risk prediction based on gradient boosted tree feature selection and recursive feature elimination: A dataset-specific modeling study

On Subagging Boosted Probit Model Trees

Seeking an optimal approach for Computer-aided Diagnosis of Pulmonary Embolism

A multitask deep learning approach for pulmonary embolism detection and identification

Boost-S: Gradient Boosted Trees for Spatial Data and Its Application to FDG-PET Imaging Data

Multilabel 12-Lead Electrocardiogram Classification Using Gradient Boosting Tree Ensemble

Automated detection of pulmonary embolism from CT-angiograms using deep learning

Using Topological Data Analysis for diagnosis pulmonary embolism

Computer-aided diagnosis of lung nodule using gradient tree boosting and Bayesian optimization

Adaptive Stochastic Gradient Boosting Tree with Composite Criterion

Boosting Trees for Cost-Sensitive Classifications.

MACE Prediction of Acute Coronary Syndrome Via Boosted Resampling Classification Using Electronic Medical Records

Multitask Deep Learning for Accurate Risk Stratification and Prediction of Next Steps for Coronary CT Angiography Patients

Enhancing Heart Disease Prediction Accuracy through Machine Learning Techniques and Optimization

Bagged Boosted Trees for Classification of Ecological Momentary Assessment Data

A Gradient-Boosted Decision-Tree Algorithm for the Prediction of Short-Term Mortality in Acute Heart Failure Patients

Leveraging Classifier Performance Using Heuristic Optimization for Detecting Cardiovascular Disease from PPG Signals

Gradient Boosting on Decision Trees for Mortality Prediction in Transcatheter Aortic Valve Implantation