AI‐powered visual diagnosis of vulvar lichen sclerosus: A pilot study

Philippe Gottfrois,Jie Zhu,Alexandra Steiger,Ludovic Amruthalingam,Andre B. Kind,Viola Heinzelmann,Claudia Mang,Alexander A. Navarini,Simon M. Mueller
DOI: https://doi.org/10.1111/jdv.20306
2024-08-31
Journal of the European Academy of Dermatology and Venereology
Abstract:In this AI‐powered pilot study, images were initially collected from three sources: a hospital database, an art collective and the public via a self‐established website. This collection resulted in an initial set of 757 images of vulvar lichen sclerosus (VLS) and 424 images of healthy vulvas and other vulvar conditions. These images underwent a data cleaning process, which reduced them to 1087 usable images, including 684 VLS and 403 non‐VLS images. The data set was then split into a training set and a test set. The AI model developed from the training set was evaluated on the test set, achieving reliable performance with an average recall of 0.94, precision of 0.99 and accuracy of 0.95 over three runs. Background Vulvar lichen sclerosus (VLS) is a chronic inflammatory skin condition associated with significant impairment of quality of life and potential risk of malignant transformation. However, diagnosis of VLS is often delayed due to its variable clinical presentation and shame‐related late consultation. Machine learning (ML)‐trained image recognition software could potentially facilitate early diagnosis of VLS. Objective To develop a ML‐trained image‐based model for the detection of VLS. Methods Images of both VLS and non‐VLS anogenital skin were collected, anonymized, and selected. In the VLS images, 10 typical skin signs (whitening, hyperkeratosis, purpura/ecchymosis, erosion/ulcers/excoriation, erythema, labial fusion, narrowing of the introitus, labia minora resorption, posterior commissure (fourchette) band formation and atrophic shiny skin) were manually labelled. A deep convolutional neural network was built using the training set as input data and then evaluated using the test set, where the developed algorithm was run three times and the results were then averaged. Results A total of 684 VLS images and 403 non‐VLS images (70% healthy vulva and 30% with other vulvar diseases) were included after the selection process. A deep learning algorithm was developed by training on 775 images (469 VLS and 306 non‐VLS) and testing on 312 images (215 VLS and 97 non‐VLS). This algorithm performed accurately in discriminating between VLS and non‐VLS cases (including healthy individuals and non‐VLS dermatoses), with mean values of 0.94, 0.99 and 0.95 for recall, precision and accuracy, respectively. Conclusion This pilot project demonstrated that our image‐based deep learning model can effectively discriminate between VLS and non‐VLS skin, representing a promising tool for future use by clinicians and possibly patients. However, prospective studies are needed to validate the applicability and accuracy of our model in a real‐world setting.
dermatology
What problem does this paper attempt to address?