Prediction and interpretation of gamma pass rate based on SHAP value feature selection

Qianxi Ni,Luqiao Chen,Jun Zhu,Jinmeng Pang,Zhiyan Wang,Xiaohua Yang
DOI: https://doi.org/10.21203/rs.3.rs-2974857/v1
2023-01-01
Abstract:Background SHAP values are suggested as a unique measure of feature importance in machine learning prediction models. It can explain the output of any machine learning prediction model and can also participate in the construction of machine learning prediction models as a feature selection mechanism for handling high-dimensional data. In this study ,the SHAP values and extreme gradient boosting(XGBoost) algorithm were combined to select the best radiomics features for the establishment of the gamma pass rate(GPR) prediction model.The feasibility and effectiveness of the prediction model were evaluated . Methods Retrospective analysis of the 3D dosimetric verification results based on measurements with GPR criteria of 3%/2 mm and 10% dose threshold of 196 pelvic intensity-modulated radiation therapy (IMRT) was carried. Radiomic features were extracted from the dose files, from which the XGBoost algorithm based on SHAP values was used to select the optimal feature subset as the input for the prediction model. Four machine learning classification models were constructed when the number of features was 50, 80, 110 and 140 respectively, and the AUC values, recall and F1 scores were calculated to assess the classification performance of the prediction models. Results The prediction model constructed based on the 110 features selected by SHAP values had an AUC value of 0.81, a recall of 0.93 and an F1 score of 0.82, which were better than the other three models. Conclusion It is feasible to use the SHAP values in combination with the XGBoost algorithm to select the best subset of radiomic features for the GPR prediction models. The global explanations and single-sample explanations of the model output through SHAP values may offer reference for medical physicists to provide high-quality plans, promoting the clinical application and implementation of GPR prediction models, and providing safe and efficient personalized QA management for patients.
What problem does this paper attempt to address?