Bolstering Performance Evaluation of Image Segmentation Models with Efficacy Metrics in the Absence of a Gold Standard

Lina Tang,Jinyuan Shao,Shiyan Pang,Yameng Wang,Aaron Maxwell,Xiangyun Hu,Zhi Gao,Ting Lan,Guofan Shao
DOI: https://doi.org/10.1109/tgrs.2024.3446950
IF: 8.2
2024-01-01
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Image segmentation using deep learning has become overwhelmingly widespread. However, routine model testing methods can encounter evaluation inconsistencies or bias, largely due to how accuracy metrics respond to variations in class share distribution. Here, we address the effects of class imbalance on model performance evaluation and demonstrate a refined approach that incorporates image classification efficacy (ICE) metrics within the context of semantic segmentation in remote sensing. This evaluation approach was applied in six segmentation experiments that involved multispectral and LiDAR data, single or multiple models tested with the same or different datasets, and binary and multiclass schemes. ICE metrics revealed unique aspects of model's segmentation capabilities compared to precision, recall, F-score, and overall accuracy. By mitigating the class imbalance effect, per-class efficacy enables precise class-level optimization of segmentation models, while whole-class efficacy facilitates evaluating a model's potential performance when adapted to new datasets. The suitability of the kappa coefficient, ROC-AUC, and PR-AUC for model evaluation under class imbalance was discussed in comparison with ICE metrics. This efficacy-enhanced model evaluation protocol can be implemented for deep learning model training and testing. The routine use of this evaluation approach will strengthen the dependability and applicability of segmentation tools in various fields.
What problem does this paper attempt to address?