The impact of high-order features on performance of radiomics studies in CT non-small cell lung cancer

Gary Ge,Jason Z Zhang,Jie Zhang
DOI: https://doi.org/10.1016/j.clinimag.2024.110244
Abstract:High-order radiomic features have been shown to produce high performance models in a variety of scenarios. However, models trained without high-order features have shown similar performance, raising the question of whether high-order features are worth including given their increased computational burden. This comparative study investigates the impact of high-order features on model performance in CT-based Non-Small Cell Lung Cancer (NSCLC) and the potential uncertainty regarding their application in machine learning. Three categories of features were retrospectively retrieved from CT images of 347 NSCLC patients: first- and second-order statistical features, morphological features and transform (high-order) features. From these, three datasets were constructed: a "low-order" dataset (Lo) which included the first-order, second-order, and morphological features, a high-order dataset (Hi), and a combined dataset (Combo). A diverse selection of datasets, feature selection methods, and predictive models were included for the uncertainty analysis, with two-year survival as the study endpoint. AUC values were calculated for comparisons and Kruskal-Wallis testing was performed to determine significant differences. The Hi (AUC: 0.41-0.62) and Combo (AUC: 0.41-0.62) datasets generate significantly (P < 0.01) higher model performance than the Lo dataset (AUC: 0.42-0.58). High-order features are selected more often than low-order features for model training, comprising 87 % of selected features in the Combo dataset. High-order features are a source of data that can improve machine learning model performance. However, its impact strongly depends on various factors that may lead to inconsistent results. A clear approach to incorporate high-order features in radiomic studies requires further investigation.
What problem does this paper attempt to address?