SynFundus-1M: A High-quality Million-scale Synthetic fundus images Dataset with Fifteen Types of Annotation

Fangxin Shang,Jie Fu,Yehui Yang,Haifeng Huang,Junwei Liu,Lei Ma

2024-03-14

Abstract:Large-scale public datasets with high-quality annotations are rarely available for intelligent medical imaging research, due to data privacy concerns and the cost of annotations. In this paper, we release SynFundus-1M, a high-quality synthetic dataset containing over one million fundus images in terms of \textbf{eleven disease types}. Furthermore, we deliberately assign four readability labels to the key regions of the fundus images. To the best of our knowledge, SynFundus-1M is currently the largest fundus dataset with the most sophisticated annotations. Leveraging over 1.3 million private authentic fundus images from various scenarios, we trained a powerful Denoising Diffusion Probabilistic Model, named SynFundus-Generator. The released SynFundus-1M are generated by SynFundus-Generator under predefined conditions. To demonstrate the value of SynFundus-1M, extensive experiments are designed in terms of the following aspect: 1) Authenticity of the images: we randomly blend the synthetic images with authentic fundus images, and find that experienced annotators can hardly distinguish the synthetic images from authentic ones. Moreover, we show that the disease-related vision features (e.g. lesions) are well simulated in the synthetic images. 2) Effectiveness for down-stream fine-tuning and pretraining: we demonstrate that retinal disease diagnosis models of either convolutional neural networks (CNN) or Vision Transformer (ViT) architectures can benefit from SynFundus-1M, and compared to the datasets commonly used for pretraining, models trained on SynFundus-1M not only achieve superior performance but also demonstrate faster convergence on various downstream tasks. SynFundus-1M is already public available for the open-source community.

Computer Vision and Pattern Recognition,Artificial Intelligence

What problem does this paper attempt to address?

The main problem this paper attempts to address is the scarcity of high-quality, large-scale public datasets in medical imaging research, particularly in the field of fundus image analysis. Due to data privacy and annotation cost constraints, existing public datasets are usually limited in number and have fewer annotation categories, which cannot meet the needs of training deep learning models. To solve this problem, the authors propose SynFundus-1M, a dataset containing over 1 million synthetic fundus images with detailed annotations of 15 types (including 11 disease labels and 4 image readability labels). These high-quality synthetic images were generated by training a powerful denoising diffusion probabilistic model (SynFundus-Generator) using a large number of real fundus images. Experimental results show that these synthetic images are not only visually indistinguishable from real images but also significantly improve model performance and convergence speed when used for downstream tasks such as diabetic retinopathy grading and glaucoma diagnosis. Additionally, the release of SynFundus-1M aims to promote research in the field of medical image analysis while protecting data privacy.

SynFundus-1M: A High-quality Million-scale Synthetic fundus images Dataset with Fifteen Types of Annotation

SynFundus: A Synthetic Fundus Images Dataset with Millions of Samples and Multi-Disease Annotations

Retinal Image Synthesis from Multiple-Landmarks Input with Generative Adversarial Networks.

Non-Invasive to Invasive: Enhancing FFA Synthesis from CFP with a Benchmark Dataset and a Novel Network

Robust deep learning for eye fundus images: Bridging real and synthetic data for enhancing generalization

A Disease-Specific Foundation Model Using Over 100K Fundus Images: Release and Validation for Abnormality and Multi-Disease Classification on Downstream Tasks

An Unsupervised Fundus Image Enhancement Method with Multi-Scale Transformer and Unreferenced Loss

Multiple Lesions Insertion: boosting diabetic retinopathy screening through Poisson editing

Synthesizing New Retinal Symptom Images by Multiple Generative Models

Synthetic Medical Images from Dual Generative Adversarial Networks

Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

IRFundusSet: An Integrated Retinal Fundus Dataset with a Harmonized Healthy Label

Diagnosis of Multiple Fundus Disorders Amidst a Scarcity of Medical Experts Via Self-supervised Machine Learning

How Good Are Synthetic Medical Images? An Empirical Study with Lung Ultrasound

Leveraging Regular Fundus Images for Training UWF Fundus Diagnosis Models via Adversarial Learning and Pseudo-Labeling

FIVES: A Fundus Image Dataset for Artificial Intelligence based Vessel Segmentation

VisionCLIP: An Med-AIGC based Ethical Language-Image Foundation Model for Generalizable Retina Image Analysis

Interpretable Detection of Diabetic Retinopathy, Retinal Vein Occlusion, Age-Related Macular Degeneration, and Other Fundus Conditions

2D medical image synthesis using transformer-based denoising diffusion probabilistic model

RetiGen: A Framework for Generalized Retinal Diagnosis Using Multi-View Fundus Images

MultiEYE: Dataset and Benchmark for OCT-Enhanced Retinal Disease Recognition from Fundus Images