Benchmarking features from different radiomics toolkits / toolboxes using Image Biomarkers Standardization Initiative

Mingxi Lei,Bino Varghese,Darryl Hwang,Steven Cen,Xiaomeng Lei,Afshin Azadikhah,Bhushan Desai,Assad Oberai,Vinay Duddalwar
DOI: https://doi.org/10.48550/arXiv.2006.12761
2020-06-23
Abstract:There is no consensus regarding the radiomic feature terminology, the underlying mathematics, or their implementation. This creates a scenario where features extracted using different toolboxes could not be used to build or validate the same model leading to a non-generalization of radiomic results. In this study, the image biomarker standardization initiative (IBSI) established phantom and benchmark values were used to compare the variation of the radiomic features while using 6 publicly available software programs and 1 in-house radiomics pipeline. All IBSI-standardized features (11 classes, 173 in total) were extracted. The relative differences between the extracted feature values from the different software and the IBSI benchmark values were calculated to measure the inter-software agreement. To better understand the variations, features are further grouped into 3 categories according to their properties: 1) morphology, 2) statistic/histogram and 3)texture features. While a good agreement was observed for a majority of radiomics features across the various programs, relatively poor agreement was observed for morphology features. Significant differences were also found in programs that use different gray level discretization approaches. Since these programs do not include all IBSI features, the level of quantitative assessment for each category was analyzed using Venn and the UpSet diagrams and also quantified using two ad hoc metrics. Morphology features earns lowest scores for both metrics, indicating that morphological features are not consistently evaluated among software programs. We conclude that radiomic features calculated using different software programs may not be identical and reliable. Further studies are needed to standardize the workflow of radiomic feature extraction.
Computer Vision and Pattern Recognition,Image and Video Processing
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the inconsistency of radiomics features among different software tools. Specifically: 1. **Lack of consensus in terms, mathematical definitions and implementations**: Currently, different radiomics toolkits have no unified standards for feature terms, mathematical definitions and their implementation methods. This results in the features extracted using different tools being unable to be used to construct or validate the same model, thus affecting the generalization ability of radiomics results. 2. **Differences in feature calculations**: In the study, by using the digital phantom and benchmark values established by the Image Biomarker Standardization Initiative (IBSI), the differences among 6 publicly available software programs and 1 in - house radiomics pipeline in extracting 173 IBSI - standardized features were compared. The results showed that although most radiomics features showed good consistency among different programs, the consistency of morphological features was poor. 3. **Impact of gray - level discretization methods**: The study also found that there were significant differences among programs using different gray - level discretization methods. These differences may stem from problems such as intensity shift in the pre - processing steps. 4. **Differences among feature categories**: To better understand these differences, features were divided into three categories: morphological features, statistical/histogram features and texture features. Morphological features performed the worst, indicating that these features were not evaluated consistently enough in different software. 5. **Need for standardization**: The paper points out that since radiomics features calculated by different software programs may not be exactly the same and are unreliable, further research is needed to standardize the radiomics feature extraction workflow. ### Formula summary - **Gray - level discretization formula**: - Fixed bin number: \[ X_{d,k}=\begin{cases}\left\lfloor\frac{N_g(X_{gl,k}-X_{gl,\min})}{X_{gl,\max}-X_{gl,\min}}\right\rfloor + 1&\text{if }X_{gl,k}<X_{gl,\max}\\N_g&\text{if }X_{gl,k}=X_{gl,\max}\end{cases} \] - Fixed bin width: \[ X_{d,k}=\left\lfloor\frac{X_{gl,k}-X_{gl,\min}}{w_b}\right\rfloor + 1 \] - **Relative difference formula**: \[ \text{relative difference}=\frac{\vert\text{feature value}-\text{benchmark value}\vert}{\text{benchmark value}} \] - **Popularity index formula**: - Popularity 1: \[ P_1=\frac{\sum_{i = 1}^{d}w_i}{6d},\quad0\leq P_1\leq1 \] - Popularity 2: \[ P_2=\frac{\sum_{i = 1}^{d}1(w_i>4)}{d},\quad0\leq P_2\leq1 \] ### Conclusion This study emphasizes the consistency and differences in feature calculations among different radiomics software and points out the problems existing in the calculation of morphological features. To improve the reliability and reproducibility of radiomics analysis, future research should be dedicated to standardizing the feature extraction workflow and ensuring consistency among different software.