Weakly supervised deep learning image analysis can differentiate melanoma from naevi on haematoxylin and eosin‐stained histopathology slides
Nigel G. Maher,Homay Danaei Mehr,Cong Cong,Nurudeen A. Adegoke,Ismael A. Vergara,Sidong Liu,Richard A. Scolyer
DOI: https://doi.org/10.1111/jdv.20307
2024-09-03
Journal of the European Academy of Dermatology and Venereology
Abstract:In total, 520 mucocutaneous melanocytic tumours, evenly divided between melanomas and naevi, were labelled only at the whole‐slide or whole‐tissue section level. Data were split 80% for training, 10% for validation and 10% for testing and three multiple‐instance learning frameworks were evaluated to predict melanoma or not. Stratified fivefold Monte Carlo cross‐validation was performed. The average AUC result for Trans‐MIL was 0.99, CLAM 0.99 and DTFD‐MIL 0.97. Background The broad histomorphological spectrum of melanocytic pathologies requires large data sets to develop accurate and generalisable deep learning (DL)‐based diagnostic pathology classifiers. Weakly supervised DL promotes utilisation of larger training data sets compared to fully supervised (patch annotation) approaches. Objectives To evaluate weakly supervised DL image classifiers for discriminating melanomas from naevi on haematoxylin and eosin (H&E)‐stained pathology slides. Methods A representative H&E slide for 260 naevi and 260 melanomas from mucocutaneous sites at one tertiary institution was digitized. Clinicopathological features were recorded for each case including thickness and histological subtype. Whole‐slide or whole‐tissue section labels were applied. The ground truth was established by consensus diagnosis from two pathologists. Multiple‐instance learning models, Trans‐MIL, CLAM and DTFD‐MIL were evaluated at 10×, 20× and 40× magnifications using stratified fivefold Monte Carlo cross‐validation, with 80/10/10 splits for training/validation/test groups, to predict melanoma from naevus. Heatmaps were generated to understand model performance. Results Naevi cases were younger (median age: 51 years; melanoma median age: 71.5 years), with more balanced sex distribution (males: 48.8%, melanoma male subgroup: 64.2%). The most frequent histological subtypes of naevi and melanomas were dysplastic compound (n = 99, 38.1%) and superficial spreading (n = 124, 47.7%), respectively. Average AUC (±1 SD) for Trans‐MIL, CLAM and DTFD‐MIL across test groups were 0.9952 ± 0.006, 0.9925 ± 0.0052 and 0.9708 ± 0.0328, at 20× magnification, respectively. Performance of the models varied according to the magnification used. Heatmaps from the two best performing models, Trans‐MIL and CLAM, generally indicated attention on appropriate tissue regions for interpretation. Conclusions Weakly supervised DL on pathological slides of common mucocutaneous melanocytic tumours provides highly accurate diagnostic value for discrimination of melanomas and naevi. External validation and further assessment on less frequently occurring histologic subtypes and borderline cases using this method is required.
dermatology