Task-based assessment for neural networks: evaluating undersampled MRI reconstructions based on human observer signal detection

Joshua D Herman,Rachel E Roca,Alexandra G O'Neill,Marcus L Wong,Sajan Goud Lingala,Angel R Pineda
DOI: https://doi.org/10.1117/1.JMI.11.4.045503
Abstract:Purpose: Recent research explores using neural networks to reconstruct undersampled magnetic resonance imaging. Because of the complexity of the artifacts in the reconstructed images, there is a need to develop task-based approaches to image quality. We compared conventional global quantitative metrics to evaluate image quality in undersampled images generated by a neural network with human observer performance in a detection task. The purpose is to study which acceleration (2×, 3×, 4×, 5×) would be chosen with the conventional metrics and compare it to the acceleration chosen by human observer performance. Approach: We used common global metrics for evaluating image quality: the normalized root mean squared error (NRMSE) and structural similarity (SSIM). These metrics are compared with a measure of image quality that incorporates a subtle signal for a specific task to allow for image quality assessment that locally evaluates the effect of undersampling on a signal. We used a U-Net to reconstruct under-sampled images with 2×, 3×, 4×, and 5× one-dimensional undersampling rates. Cross-validation was performed for a 500- and a 4000-image training set with both SSIM and MSE losses. A two-alternative forced choice (2-AFC) observer study was carried out for detecting a subtle signal (small blurred disk) from images with the 4000-image training set. Results: We found that for both loss functions, the human observer performance on the 2-AFC studies led to a choice of a 2× undersampling, but the SSIM and NRMSE led to a choice of a 3× undersampling. Conclusions: For this detection task using a subtle small signal at the edge of detectability, SSIM and NRMSE led to an overestimate of the achievable undersampling using a U-Net before a steep loss of image quality between 2×, 3×, 4×, 5× undersampling rates when compared to the performance of human observers in the detection task.
What problem does this paper attempt to address?