Assessment of image quality on the diagnostic performance of clinicians and deep learning models: Cross‐sectional comparative reader study
A. I. Oloruntoba,M. Asghari‐Jafarabadi,M. Sashindranath,Å. Ingvar,N. R. Adler,C. Vico‐Alonso,L. Niklasson,A. L. Caixinha,E. Hiscutt,Z. Holmes,K. B. Assersen,S. Adamson,T. Jegathees,T. Bertelsen,V. Velasco‐Tamariz,T. Helkkula,S. Kristiansen,R. Toholka,M. S. Goh,A. Chamberlain,C. McCormack,T. Vestergaard,D. Mehta,T. D. Nguyen,Z. Ge,H. P. Soyer,V Mar
DOI: https://doi.org/10.1111/jdv.20462
2024-12-12
Journal of the European Academy of Dermatology and Venereology
Abstract:This graphical abstract illustrates the impact of dermoscopic image quality on the performance of clinicians and a CNN model. It presents examples from the 303‐image test set and compares the CNN's AUROC across different image qualities against clinician performance. Background Skin cancer is a prevalent and clinically significant condition, with early and accurate diagnosis being crucial for improved patient outcomes. Dermoscopy and artificial intelligence (AI) hold promise in enhancing diagnostic accuracy. However, the impact of image quality, particularly high dynamic range (HDR) conversion in smartphone images, on diagnostic performance remains poorly understood. Objective This study aimed to investigate the effect of varying image qualities, including HDR‐enhanced dermoscopic images, on the diagnostic capabilities of clinicians and a convolutional neural network (CNN) model. Methods Eighteen dermatology clinicians assessed 303 images of 101 skin lesions that were categorized into three image quality groups: low quality (LQ), high quality (HQ) and enhanced quality (EQ) produced using HDR‐style conversion. Clinicians participated in a two part reader study that required their diagnosis, management and confidence level for each image assessed. Results In the binary classification of lesions, clinicians had the greatest diagnostic performance with HQ images, with sensitivity (77.3%; CI 69.1–85.5), specificity (63.1%; CI 53.7–72.5) and accuracy (70.2%; CI 61.3–79.1). For the multiclass classification, the overall performance was also best with HQ images, attaining the greatest specificity (91.9%; CI 83.2–95.0) and accuracy (51.5%; CI 48.4–54.7). Clinicians had a superior performance (median correct diagnoses) to the CNN model for the binary classification of LQ and EQ images, but their performance was comparable on the HQ images. However, in the multiclass classification, the CNN model significantly outperformed the clinicians on HQ images (p
dermatology