MedImg: a database for public medical image integration

Bitao Zhong,Rui Fan,Xiangwen Ji,Qinghua Cui,Chunmei Cui
DOI: https://doi.org/10.1101/2024.04.16.589768
2024-04-20
Abstract:The advancements of deep learning algorithms in medical image analysis has garnered tremendous attention in recent years. Several studies have reported that the models have achieved and even surpassed human performance, whereas the translation of these models into clinical practice is still accompanied by various challenges. A major challenge is the large-scale and well characterized dataset to validate the generalization of models. Therefore, we collected diverse medical image datasets from multiple public sources containing 103 datasets, 1,622,956 images. These images are derived from 14 modalities like XR, CT, MRI, OCT, ultrasound, and endoscopy, and from 9 organs such as lung, brain, eye, and heart. Subsequently, we constructed an online database, MedImg, which incorporates and hierarchically organizes medical images to facilitate data access. MedImg serves as an intuitive and open-access platform for contributing to deep learning-based medical image analysis, accessible at .
Bioinformatics
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the challenges faced by current deep - learning models in the transformation to clinical practice in the field of medical image analysis, especially the lack of large - scale, multi - modal, and multi - organ high - quality annotated datasets to verify and test the generalization ability of these models. Specifically: 1. **Lack of large - scale and diverse datasets**: Although deep - learning algorithms have made remarkable progress in medical image analysis and even surpassed human performance in some tasks, there are still many challenges in transforming them into clinical applications. A major obstacle is the lack of large - scale and well - characterized datasets, which are crucial for training, validating, and testing deep - learning models. 2. **Limitations of existing databases**: Most of the existing medical image databases focus on a single organ or disease, or only contain a single imaging modality (such as X - ray, CT, MRI, etc.), which limits the development of general - purpose deep - learning models. To solve these problems, the author proposes and constructs an online medical image database named MedImg. This database integrates diverse medical image datasets from multiple public sources and hierarchically organizes all available data according to organs and imaging modalities. The characteristics of the MedImg database include: - **Data scale**: It contains 103 datasets, with a total of 1,622,956 images. - **Diversity of imaging modalities**: It covers 14 imaging modalities, such as X - ray (XR), computed tomography (CT), magnetic resonance imaging (MRI), optical coherence tomography (OCT), ultrasound, and endoscopy, etc. - **Organ coverage**: It involves 9 organs, including the lung, brain, eye, heart, etc. - **Open access**: It provides an intuitive and easy - to - use platform where users can freely browse, retrieve, and download images. By establishing such a comprehensive database, MedImg aims to facilitate researchers' rapid access to benchmark datasets, thereby promoting the development of more general - purpose and robust deep - learning algorithms in medical image analysis.