Whole slide image-based weakly supervised deep learning for predicting major pathological response in non-small cell lung cancer following neoadjuvant chemoimmunotherapy: a multicenter, retrospective, cohort study

Dan Han,Hao Li,Xin Zheng,Shenbo Fu,Ran Wei,Qian Zhao,Chengxin Liu,Zhongtang Wang,Wei Huang,Shaoyu Hao
DOI: https://doi.org/10.3389/fimmu.2024.1453232
2024-09-20
Abstract:Objective: Develop a predictive model utilizing weakly supervised deep learning techniques to accurately forecast major pathological response (MPR) in patients with resectable non-small cell lung cancer (NSCLC) undergoing neoadjuvant chemoimmunotherapy (NICT), by leveraging whole slide images (WSIs). Methods: This retrospective study examined pre-treatment WSIs from 186 patients with non-small cell lung cancer (NSCLC), using a weakly supervised learning framework. We employed advanced deep learning architectures, including DenseNet121, ResNet50, and Inception V3, to analyze WSIs on both micro (patch) and macro (slide) levels. The training process incorporated innovative data augmentation and normalization techniques to bolster the robustness of the models. We evaluated the performance of these models against traditional clinical predictors and integrated them with a novel pathomics signature, which was developed using multi-instance learning algorithms that facilitate feature aggregation from patch-level probability distributions. Results: Univariate and multivariable analyses confirmed histology as a statistically significant prognostic factor for MPR (P-value< 0.05). In patch model evaluations, DenseNet121 led in the validation set with an area under the curve (AUC) of 0.656, surpassing ResNet50 (AUC = 0.626) and Inception V3 (AUC = 0.654), and showed strong generalization in external testing (AUC = 0.611). Further evaluation through visual inspection of patch-level data integration into WSIs revealed XGBoost's superior class differentiation and generalization, achieving the highest AUCs of 0.998 in training and robust scores of 0.818 in validation and 0.805 in testing. Integrating pathomics features with clinical data into a nomogram yielded AUC of 0.819 in validation and 0.820 in testing, enhancing discriminative accuracy. Gradient-weighted Class Activation Mapping (Grad-CAM) and feature aggregation methods notably boosted the model's interpretability and feature modeling. Conclusion: The application of weakly supervised deep learning to WSIs offers a powerful tool for predicting MPR in NSCLC patients treated with NICT.
What problem does this paper attempt to address?