Dark corner artefact and diagnostic performance of a market‐approved neural network for skin cancer classification

Katharina Sies,Julia K. Winkler,Christine Fink,Felicitas Bardehle,Ferdinand Toberer,Felix K. F. Kommoss,Timo Buhl,Alexander Enk,Albert Rosenberger,Holger A. Haenssle
DOI: https://doi.org/10.1111/ddg.14384
2021-05-10
Abstract:<section class="article-section__content"><h3 class="article-section__sub-title section1"> Background and objectives</h3><p>Convolutional neural networks (CNN) have proven dermatologist‐level performance in skin lesion classification. Prior to a broader clinical application, an assessment of limitations is crucial. Therefore, the influence of a dark tubular periphery in dermatoscopic images (also called dark corner artefact [DCA]) on the diagnostic performance of a market‐approved CNN for skin lesion classification was investigated.</p></section><section class="article-section__content"><h3 class="article-section__sub-title section1"> Patients and methods</h3><p>A prospective image set of 233 skin lesions (60 malignant, 173 benign) without DCA (control‐set) was modified to show small, medium or large DCA. All 932 images were analyzed by a market‐approved CNN (Moleanalyzer‐Pro<sup>®</sup>, FotoFinder Systems), providing malignancy scores (range 0–1) with the cut‐off &gt; 0.5 indicating malignancy. </p></section><section class="article-section__content"><h3 class="article-section__sub-title section1"> Results</h3><p>In the control‐set the CNN achieved a sensitivity of 90.0 % (79.9 % – 95.3 %), a specificity of 96.5 % (92.6 % – 98.4 %), and an area under the curve (AUC) of receiver operating characteristics (ROC) of 0.961 (0.932 – 0.989). Comparable diagnostic performance was observed in the DCAsmall‐set and DCAmedium‐set. Conversely, in the DCAlarge‐set significantly increased malignancy scores triggered a significantly decreased specificity (87.9 % [82.2 % – 91.9 %], <i>P</i> &lt; 0.001), non‐significantly increased sensitivity (96.7 % [88.6 % – 99.1 %]) and unchanged ROC‐AUC of 0.962 (0.935 – 0.989). </p></section><section class="article-section__content"><h3 class="article-section__sub-title section1"> Conclusions</h3><p>Convolutional neural network classification was robust in images with small and medium DCA, but impaired in images with large DCA. Physicians should be aware of this limitation when submitting images to CNN classification.</p></section>
What problem does this paper attempt to address?