S-SYNTH: Knowledge-Based, Synthetic Generation of Skin Images

Andrea Kim,Niloufar Saharkhiz,Elena Sizikova,Miguel Lago,Berkman Sahiner,Jana Delfino,Aldo Badano
2024-08-01
Abstract:Development of artificial intelligence (AI) techniques in medical imaging requires access to large-scale and diverse datasets for training and evaluation. In dermatology, obtaining such datasets remains challenging due to significant variations in patient populations, illumination conditions, and acquisition system characteristics. In this work, we propose S-SYNTH, the first knowledge-based, adaptable open-source skin simulation framework to rapidly generate synthetic skin, 3D models and digitally rendered images, using an anatomically inspired multi-layer, multi-component skin and growing lesion model. The skin model allows for controlled variation in skin appearance, such as skin color, presence of hair, lesion shape, and blood fraction among other parameters. We use this framework to study the effect of possible variations on the development and evaluation of AI models for skin lesion segmentation, and show that results obtained using synthetic data follow similar comparative trends as real dermatologic images, while mitigating biases and limitations from existing datasets including small dataset size, lack of diversity, and underrepresentation.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve The paper aims to address several key challenges faced when using skin images in medical imaging for the development of artificial intelligence (AI) technologies: 1. **Insufficient Dataset Diversity**: Existing skin image datasets are typically small in scale and lack diversity, especially in terms of skin color. For example, samples with darker skin tones are often underrepresented in publicly available datasets. 2. **Time-Consuming and Difficult Annotation**: Segmentation and annotation of skin images are very time-consuming and challenging, leading to a limited number of samples in publicly available skin image datasets, which may not fairly represent the target patient population. 3. **Dataset Bias**: These limitations in existing datasets may lead to biases during training and evaluation, particularly in the task of skin lesion segmentation, where the impact on darker skin tones is especially significant. To address these issues, the authors propose a knowledge-driven, adaptable open-source skin simulation framework called **S-SYNTH**. This framework can quickly generate synthetic skin images, 3D models, and digitally rendered images, allowing control over parameters that alter skin appearance, such as skin color, presence of hair, lesion shape, etc. By using the S-SYNTH framework, researchers can generate diverse synthetic images, thereby alleviating the problems of small sample size, lack of diversity, and underrepresentation in existing datasets. The paper also demonstrates that the results of developing and evaluating skin lesion segmentation models using synthetic data show trends similar to those using real skin images.