Evaluation of Multi-indicator And Multi-organ Medical Image Segmentation Models

Qi Ye,Lihua Guo
2023-06-01
Abstract:In recent years, "U-shaped" neural networks featuring encoder and decoder structures have gained popularity in the field of medical image segmentation. Various variants of this model have been developed. Nevertheless, the evaluation of these models has received less attention compared to model development. In response, we propose a comprehensive method for evaluating medical image segmentation models for multi-indicator and multi-organ (named MIMO). MIMO allows models to generate independent thresholds which are then combined with multi-indicator evaluation and confidence estimation to screen and measure each organ. As a result, MIMO offers detailed information on the segmentation of each organ in each sample, thereby aiding developers in analyzing and improving the model. Additionally, MIMO can produce concise usability and comprehensiveness scores for different models. Models with higher scores are deemed to be excellent models, which is convenient for clinical evaluation. Our research tests eight different medical image segmentation models on two abdominal multi-organ datasets and evaluates them from four perspectives: correctness, confidence estimation, Usable Region and MIMO. Furthermore, robustness experiments are tested. Experimental results demonstrate that MIMO offers novel insights into multi-indicator and multi-organ medical image evaluation and provides a specific and concise measure for the usability and comprehensiveness of the model. Code: <a class="link-external link-https" href="https://github.com/SCUT-ML-GUO/MIMO" rel="external noopener nofollow">this https URL</a>
Image and Video Processing,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper attempts to address the issue of insufficient evaluation methods in multi-metric and multi-organ medical image segmentation models. Although "U-shaped" neural networks have achieved significant success in the field of medical image segmentation, and various variants (such as Attention U-Net, nnU-Net, etc.) have emerged, the evaluation methods for these models have received less attention compared to model development. Traditional evaluation methods mainly focus on accuracy metrics (such as Dice coefficient, Hausdorff distance), and while these metrics are important, they are not sufficient to comprehensively evaluate the model's practicality in clinical practice. The paper proposes a new multi-metric and multi-organ medical image segmentation model evaluation method (MIMO), which aims to screen and measure each organ by generating independent thresholds and combining multi-metric evaluation and confidence estimation. Specifically, MIMO allows the model to generate independent thresholds and screen sample organs by jointly ranking prediction correctness indices and confidence estimates. Then, MIMO provides feedback on whether each organ in each sample meets the standard, thereby aiding subsequent analysis and model improvement. Additionally, MIMO outputs usability and comprehensiveness scores in the form of "regions" to facilitate intuitive evaluation of different models. The main contributions of the paper include: 1. Proposing a new multi-metric and multi-organ medical image segmentation model evaluation method that allows the model to automatically generate thresholds and screen sample organs through these thresholds. 2. Evaluating the thresholds of each organ under each evaluation metric using the Bootstrap algorithm. 3. Providing detailed methods for calculating usability and comprehensiveness scores, helping developers better analyze and improve models. 4. Validating the effectiveness of the proposed method and demonstrating its advantages in robustness by testing and evaluating eight different medical image segmentation models on two public datasets. Through this method, the researchers hope to promote more clinically-oriented model evaluation and development.