Nicholas Konz,Yuwen Chen,Hanxue Gu,Haoyu Dong,Yaqian Chen,Maciej A. Mazurowski
Abstract:Determining whether two sets of images belong to the same or different domain is a crucial task in modern medical image analysis and deep learning, where domain shift is a common problem that commonly results in decreased model performance. This determination is also important to evaluate the output quality of generative models, e.g., image-to-image translation models used to mitigate domain shift. Current metrics for this either rely on the (potentially biased) choice of some downstream task such as segmentation, or adopt task-independent perceptual metrics (e.g., FID) from natural imaging which insufficiently capture anatomical consistency and realism in medical images. We introduce a new perceptual metric tailored for medical images: Radiomic Feature Distance (RaD), which utilizes standardized, clinically meaningful and interpretable image features. We show that RaD is superior to other metrics for out-of-domain (OOD) detection in a variety of experiments. Furthermore, RaD outperforms previous perceptual metrics (FID, KID, etc.) for image-to-image translation by correlating more strongly with downstream task performance as well as anatomical consistency and realism, and shows similar utility for evaluating unconditional image generation. RaD also offers additional benefits such as interpretability, as well as stability and computational efficiency at low sample sizes. Our results are supported by broad experiments spanning four multi-domain medical image datasets, nine downstream tasks, six image translation models, and other factors, highlighting the broad potential of RaD for medical image analysis.
Computer Vision and Pattern Recognition,Machine Learning,Image and Video Processing
What problem does this paper attempt to address?
### What problem does this paper attempt to solve?
This paper aims to solve the problem of how to determine whether two groups of images belong to the same domain or different domains in modern medical image analysis and deep learning. Specifically, the paper focuses on the following aspects:
1. **Domain Shift Problem**:
- In medical image analysis, domain shift is a common problem, which usually leads to a decline in model performance. For example, when a diagnostic model is trained with image data from a certain hospital or site and tested with image data from another hospital or site, performance degradation may occur.
- The paper proposes a new metric method - Radiomic Distance (RaD) - to compare medical image distributions, thereby detecting domain shift more effectively.
2. **Evaluation of Generated Model Output Quality**:
- For image - to - image conversion models (such as inter - modality conversion, inter - sequence conversion, etc.), existing metric methods (such as FID) may not fully capture the anatomical consistency and realism in medical images.
- The paper introduces RaD to evaluate the output quality of these generated models, ensuring its applicability and accuracy in medical images.
3. **Task - Independent Perceptual Metric**:
- Some existing metric methods rely on downstream tasks (such as segmentation), which may lead to bias and require high training and annotation costs.
- As a task - independent perceptual metric, RaD can better reflect the quality of medical images and does not need to rely on specific downstream tasks.
4. **Interpretability and Stability**:
- Existing metric methods (such as FID, RadFID) are unstable on small sample sets and difficult to interpret.
- RaD provides higher stability and better interpretability by using predefined radiological features with clinical significance.
### Main Contributions
1. **Pointing out the Deficiencies of Commonly Used Metric Methods**: Emphasizes the limitations of existing metric methods (such as FID) in comparing medical image distributions, especially in meeting the unique needs of medical imaging.
2. **Introducing the RaD Metric**: Proposes a task - independent perceptual metric RaD based on radiological features, which is superior to previous metric methods in multiple aspects, including consistency with downstream tasks, stability on small sample sets, computational efficiency, and clinical interpretability.
3. **Extensive Experimental Verification**: Through extensive experiments covering multiple medical image data sets, downstream tasks, image translation, and generation models, the effectiveness of RaD is verified, especially in out - of - domain detection, image translation, and generation applications.
### Conclusion
By introducing RaD, the paper provides a more effective, more stable, and more interpretable metric method for comparing unpaired medical image distributions, solving the limitations of existing methods in medical image analysis.