Abstract:Determining whether two sets of images belong to the same or different domain is a crucial task in modern medical image analysis and deep learning, where domain shift is a common problem that commonly results in decreased model performance. This determination is also important to evaluate the output quality of generative models, e.g., image-to-image translation models used to mitigate domain shift. Current metrics for this either rely on the (potentially biased) choice of some downstream task such as segmentation, or adopt task-independent perceptual metrics (e.g., FID) from natural imaging which insufficiently capture anatomical consistency and realism in medical images. We introduce a new perceptual metric tailored for medical images: Radiomic Feature Distance (RaD), which utilizes standardized, clinically meaningful and interpretable image features. We show that RaD is superior to other metrics for out-of-domain (OOD) detection in a variety of experiments. Furthermore, RaD outperforms previous perceptual metrics (FID, KID, etc.) for image-to-image translation by correlating more strongly with downstream task performance as well as anatomical consistency and realism, and shows similar utility for evaluating unconditional image generation. RaD also offers additional benefits such as interpretability, as well as stability and computational efficiency at low sample sizes. Our results are supported by broad experiments spanning four multi-domain medical image datasets, nine downstream tasks, six image translation models, and other factors, highlighting the broad potential of RaD for medical image analysis.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem of how to determine whether two groups of images belong to the same domain or different domains in modern medical image analysis and deep learning. Specifically, the paper focuses on the following aspects: 1. **Domain Shift Problem**: - In medical image analysis, domain shift is a common problem, which usually leads to a decline in model performance. For example, when a diagnostic model is trained with image data from a certain hospital or site and tested with image data from another hospital or site, performance degradation may occur. - The paper proposes a new metric method - Radiomic Distance (RaD) - to compare medical image distributions, thereby detecting domain shift more effectively. 2. **Evaluation of Generated Model Output Quality**: - For image - to - image conversion models (such as inter - modality conversion, inter - sequence conversion, etc.), existing metric methods (such as FID) may not fully capture the anatomical consistency and realism in medical images. - The paper introduces RaD to evaluate the output quality of these generated models, ensuring its applicability and accuracy in medical images. 3. **Task - Independent Perceptual Metric**: - Some existing metric methods rely on downstream tasks (such as segmentation), which may lead to bias and require high training and annotation costs. - As a task - independent perceptual metric, RaD can better reflect the quality of medical images and does not need to rely on specific downstream tasks. 4. **Interpretability and Stability**: - Existing metric methods (such as FID, RadFID) are unstable on small sample sets and difficult to interpret. - RaD provides higher stability and better interpretability by using predefined radiological features with clinical significance. ### Main Contributions 1. **Pointing out the Deficiencies of Commonly Used Metric Methods**: Emphasizes the limitations of existing metric methods (such as FID) in comparing medical image distributions, especially in meeting the unique needs of medical imaging. 2. **Introducing the RaD Metric**: Proposes a task - independent perceptual metric RaD based on radiological features, which is superior to previous metric methods in multiple aspects, including consistency with downstream tasks, stability on small sample sets, computational efficiency, and clinical interpretability. 3. **Extensive Experimental Verification**: Through extensive experiments covering multiple medical image data sets, downstream tasks, image translation, and generation models, the effectiveness of RaD is verified, especially in out - of - domain detection, image translation, and generation applications. ### Conclusion By introducing RaD, the paper provides a more effective, more stable, and more interpretable metric method for comparing unpaired medical image distributions, solving the limitations of existing methods in medical image analysis.

RaD: A Metric for Medical Image Distribution Comparison in Out-of-Domain Detection and Other Applications

A Survey on Domain Generalization for Medical Image Analysis

Rethinking Perceptual Metrics for Medical Image Translation

Redesigning Out-of-Distribution Detection on 3D Medical Images

Multi-domain improves out-of-distribution and data-limited scenarios for medical image analysis

Solving Sample-Level Out-of-Distribution Detection on 3D Medical Images

MinMax Radon Barcodes for Medical Image Retrieval

What Matters in Radiological Image Segmentation? Effect of Segmentation Errors on the Diagnostic Related Features

Multi-domain improves classification in out-of-distribution and data-limited scenarios for medical image analysis

RaffeSDG: Random Frequency Filtering enabled Single-source Domain Generalization for Medical Image Segmentation

Feature Extraction for Generative Medical Imaging Evaluation: New Evidence Against an Evolving Trend

Limitations of Out-of-Distribution Detection in 3D Medical Image Segmentation

RadImageGAN -- A Multi-modal Dataset-Scale Generative AI for Medical Imaging

A Domain-specific Perceptual Metric via Contrastive Self-supervised Representation: Applications on Natural and Medical Images

Radiomics as a measure superior to common similarity metrics for tumor segmentation performance evaluation

CF Distance: A New Domain Discrepancy Metric and Application to Explicit Domain Adaptation for Cross-Modality Cardiac Image Segmentation

ROOD-MRI: Benchmarking the robustness of deep learning segmentation models to out-of-distribution and corrupted data in MRI

Radiomics as a measure superior to the Dice similarity coefficient for tumor segmentation performance evaluation

DAug: Diffusion-based Channel Augmentation for Radiology Image Retrieval and Classification

Statistical Distance-Guided Unsupervised Domain Adaptation for Automated Multi-Class Cardiovascular Magnetic Resonance Image Quality Assessment

VALD-MD: Visual Attribution via Latent Diffusion for Medical Diagnostics