Repeatability and Reproducibility of Computed Tomography Radiomics for Pulmonary Nodules A Multicenter Phantom Study

Xueqing Peng,Shuyi Yang,Lingxiao Zhou,Yu Mei,Lili Shi,Rengyin Zhang,Fei Shan,Lei Liu
DOI: https://doi.org/10.1097/RLI.0000000000000834
IF: 6.7
2022-01-01
Investigative Radiology
Abstract:Background Radiomics can yield minable information from medical images, which can facilitate computer-aided diagnosis. However, the lack of repeatability and reproducibility of radiomic features (RFs) may hinder their generalizability in clinical applications. Objectives The aims of this study were to explore 3 main sources of variability in RFs, investigate their influencing magnitudes and patterns, and identify a subset of robust RFs for further studies. Materials and Methods A chest phantom with nodules was scanned with different computed tomography (CT) scanners repeatedly with varying acquisition and reconstruction parameters (April-May 2019) to evaluate 3 sources of variability: test-retest, inter-CT, and intra-CT protocol variability. The robustness of the RFs was measured using the concordance correlation coefficient, dynamic range, and intraclass correlation coefficient (ICC). The influencing magnitudes and patterns were analyzed using the Friedman test and Spearman rank correlation coefficient. Stable and informative RFs were selected, and their redundancy was eliminated using hierarchical clustering. Clinical validation was also performed to verify the clinical effectiveness and potential enhancement of the generalizability of radiomics research. Results A total of 1295 RFs that showed all 3 sources of variability were included. The reconstruction kernel and the iteration level showed the greatest (ICC, 0.35 +/- 0.31) and the least (ICC, 0.63 +/- 0.27) influence on magnitudes. The different sources of variability showed relatively consistent patterns of influence (false discovery rate <0.001). Finally, we obtained a subset of 19 stable, informative, and nonredundant RFs under all 3 sources of variability. These RFs exhibited clinical effectiveness and showed better prediction performance than unstable RFs in the validation dataset (P = 0.017, Delong test). Conclusions The stability of RFs was affected to different degrees by test-retest and differences in CT manufacturers and models and CT acquisition and reconstruction parameters, but the influences of these factors showed relatively consistent patterns. We also obtained a subset of 19 stable, informative, and nonredundant RFs that should be preferably used to enhance the generalizability of further radiomics research.
What problem does this paper attempt to address?