Outlier detection in dermatology: Performance of different convolutional neural networks for binary classification of inflammatory skin diseases

Maximilian C Schielein,Joshua Christl,Sebastian Sitaru,Anna Caroline Pilz,Robert Kaczmarczyk,Tilo Biedermann,Tobias Lasser,Alexander Zink
DOI: https://doi.org/10.1111/jdv.18853
Abstract:Background: Artificial intelligence (AI) and convolutional neural networks (CNNs) represent rising trends in modern medicine. However, comprehensive data on the performance of AI practices in clinical dermatologic images are non-existent. Furthermore, the role of professional data selection for training remains unknown. Objectives: The aims of this study were to develop AI applications for outlier detection of dermatological pathologies, to evaluate CNN architectures' performance on dermatological images and to investigate the role of professional pre-processing of the training data, serving as one of the first anchor points regarding data selection criteria in dermatological AI-based binary classification tasks of non-melanoma pathologies. Methods: Six state-of-the-art CNN architectures were evaluated for their accuracy, sensitivity and specificity for five dermatological diseases and using five data subsets, including data selected by two dermatologists, one with 5 and the other with 11 years of clinical experience. Results: Overall, 150 CNNs were evaluated on up to 4051 clinical images. The best accuracy was reached for onychomycosis (accuracy = 1.000), followed by bullous pemphigoid (accuracy = 0.951) and lupus erythematosus (accuracy = 0.912). The CNNs InceptionV3, Xception and ResNet50 achieved the best accuracy in 9, 8 and 6 out of 25 data sets, respectively (36.0%, 32.0% and 24.0%). On average, the data set provided by the senior physician and the data set provided in accordance with both dermatologists performed the best (accuracy = 0.910). Conclusions: This AI approach for the detection of outliers in dermatological diagnoses represents one of the first studies to evaluate the performance of different CNNs for binary decisions in clinical non-dermatoscopic images of a variety of dermatological diseases other than melanoma. The selection of images by an experienced dermatologist during pre-processing had substantial benefits for the performance of the CNNs. These comparative results might guide future AI approaches to dermatology diagnostics, and the evaluated CNNs might be applicable for the future training of dermatology residents.
What problem does this paper attempt to address?