Performance Assessment of Artificial Intelligence Medical Device Software Using Synthetic Data.

Hao Wang,Xiangfeng Meng,Chao Zhang,Jiage Li
DOI: https://doi.org/10.1109/RCAR52367.2021.9517526
2021-01-01
Abstract:Objective: this study is aimed at exploring an approach to extend algorithm assessment of artificial intelligence medical device software. Method: clinical fundus photos are collected with approval from ethical review boards, as a baseline test set. Mathematical models are developed to augment the test set and simulate several types of image variation, including detector modification, focusing adjustment and illumination fluctuation. The synthetic test sets applied in the testing of artificial intelligence medical device software, using sensitivity and specificity as major metrics. Result: in case of detector modification, the sensitivity and specificity both drop 2% during the test. In case of focusing variation, the sensitivity and specificity change 25% and 15% respectively. In case of illumination variation, the maximum fluctuation of sensitivity and specificity is 15% and 7.5% respectively. Conclusion: in this paper, fundus photos are augmented through white box manners to simulate image variation in the real world. The algorithm performance on the augmented test sets shows significant fluctuation. This testing approach may help better reveal the weakness of artificial intelligence medical device software and understand its robustness.
What problem does this paper attempt to address?