Abstract:Background: Recent developments in artificial intelligence suggest that radiomics may represent a promising non-invasive biomarker to predict response to immune checkpoint inhibitors (ICIs). Nevertheless, validation of radiomics algorithms in independent cohorts remains a challenge due to variations in image acquisition and reconstruction. Using radiomics, we investigated the importance of scan normalization as part of a broader machine learning framework to enable model external generalizability to predict ICI response in non-small cell lung cancer (NSCLC) patients across different centers. Methods: Radiomics features were extracted and compared from 642 advanced NSCLC patients on pre-ICI scans using established open-source PyRadiomics and a proprietary DeepRadiomics deep learning technology. The population was separated into two groups: a discovery cohort of 512 NSCLC patients from three academic centers and a validation cohort that included 130 NSCLC patients from a fourth center. We harmonized images to account for variations in reconstruction kernel, slice thicknesses, and device manufacturers. Multivariable models, evaluated using cross-validation, were used to estimate the predictive value of clinical variables, PD-L1 expression, and PyRadiomics or DeepRadiomics for progression-free survival at 6 months (PFS-6). Results: The best prognostic factor for PFS-6, excluding radiomics features, was obtained with the combination of Clinical + PD-L1 expression (AUC = 0.66 in the discovery and 0.62 in the validation cohort). Without image harmonization, combining Clinical + PyRadiomics or DeepRadiomics delivered an AUC = 0.69 and 0.69, respectively, in the discovery cohort, but dropped to 0.57 and 0.52, in the validation cohort. This lack of generalizability was consistent with observations in principal component analysis clustered by CT scan parameters. Subsequently, image harmonization eliminated these clusters. The combination of Clinical + DeepRadiomics reached an AUC = 0.67 and 0.63 in the discovery and validation cohort, respectively. Conversely, the combination of Clinical + PyRadiomics failed generalizability validations, with AUC = 0.66 and 0.59. Conclusion: We demonstrated that a risk prediction model combining Clinical + DeepRadiomics was generalizable following CT scan harmonization and machine learning generalization methods. These results had similar performances to routine oncology practice using Clinical + PD-L1. This study supports the strong potential of radiomics as a future non-invasive strategy to predict ICI response in advanced NSCLC.

Robust machine learning challenge: An AIFM multicentric competition to spread knowledge, identify common pitfalls and recommend best practice

Developing an ensemble machine learning study: Insights from a multi-center proof-of-concept study

Adaptive Machine Learning Approach for Importance Evaluation of Multimodal Breast Cancer Radiomic Features

Generalization optimizing machine learning to improve CT scan radiomics and assess immune checkpoint inhibitors' response in non-small cell lung cancer: a multicenter cohort study

Radiomics for glioblastoma survival analysis in pre-operative MRI: exploring feature robustness, class boundaries, and machine learning techniques

Comparative performances of machine learning algorithms in radiomics and impacting factors

Machine and deep learning methods for radiomics

CT and MRI radiomic features of lung cancer (NSCLC): comparison and software consistency

Comparison of Machine Learning Classifiers to Predict Patient Survival and Genetics of GBM: Towards a Standardized Model for Clinical Implementation

A Machine Learning Challenge for Prognostic Modelling in Head and Neck Cancer Using Multi-modal Data

A radiomic approach for adaptive radiotherapy in non-small cell lung cancer patients

A framework for validating AI in precision medicine: considerations from the European ITFoC consortium

How Radiomics Can Improve Breast Cancer Diagnosis and Treatment

From pixels to prognosis: unveiling radiomics models with SHAP and LIME for enhanced interpretability

A Critical Analysis of the Robustness of Radiomics to Variations in Segmentation Methods in 18F-PSMA-1007 PET Images of Patients Affected by Prostate Cancer

How can we learn (more) from challenges? A statistical approach to driving future algorithm development

RadioPathomics: Multimodal Learning in Non-Small Cell Lung Cancer for Adaptive Radiotherapy

AI applications to medical images: From machine learning to deep learning

Insights into radiomics: impact of feature selection and classification

Machine and Deep Learning Prediction Of Prostate Cancer Aggressiveness Using Multiparametric MRI

Report on the AAPM Grand Challenge on deep generative modeling for learning medical image statistics