BIMCV COVID-19+: a large annotated dataset of RX and CT images from COVID-19 patients

Maria de la Iglesia Vayá,Jose Manuel Saborit,Joaquim Angel Montell,Antonio Pertusa,Aurelia Bustos,Miguel Cazorla,Joaquin Galant,Xavier Barber,Domingo Orozco-Beltrán,Francisco García-García,Marisa Caparrós,Germán González,Jose María Salinas
DOI: https://doi.org/10.48550/arXiv.2006.01174
2020-06-05
Abstract:This paper describes BIMCV COVID-19+, a large dataset from the Valencian Region Medical ImageBank (BIMCV) containing chest X-ray images CXR (CR, DX) and computed tomography (CT) imaging of COVID-19+ patients along with their radiological findings and locations, pathologies, radiological reports (in Spanish), DICOM metadata, Polymerase chain reaction (PCR), Immunoglobulin G (IgG) and Immunoglobulin M (IgM) diagnostic antibody tests. The findings have been mapped onto standard Unified Medical Language System (UMLS) terminology and cover a wide spectrum of thoracic entities, unlike the considerably more reduced number of entities annotated in previous datasets. Images are stored in high resolution and entities are localized with anatomical labels and stored in a Medical Imaging Data Structure (MIDS) format. In addition, 10 images were annotated by a team of radiologists to include semantic segmentation of radiological findings. This first iteration of the database includes 1,380 CX, 885 DX and 163 CT studies from 1,311 COVID-19+ patients. This is, to the best of our knowledge, the largest COVID-19+ dataset of images available in an open format. The dataset can be downloaded from <a class="link-external link-http" href="http://bimcv.cipf.es/bimcv-projects/bimcv-covid19" rel="external noopener nofollow">this http URL</a>.
Image and Video Processing,Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?