SAROS: A dataset for whole-body region and organ segmentation in CT imaging

Sven Koitka,Giulia Baldini,Lennard Kroll,Natalie van Landeghem,Olivia B. Pollok,Johannes Haubold,Obioma Pelka,Moon Kim,Jens Kleesiek,Felix Nensa,René Hosch
DOI: https://doi.org/10.1038/s41597-024-03337-6
2024-05-11
Scientific Data
Abstract:The Sparsely Annotated Region and Organ Segmentation (SAROS) dataset was created using data from The Cancer Imaging Archive (TCIA) to provide a large open-access CT dataset with high-quality annotations of body landmarks. In-house segmentation models were employed to generate annotation proposals on randomly selected cases from TCIA. The dataset includes 13 semantic body region labels (abdominal/thoracic cavity, bones, brain, breast implant, mediastinum, muscle, parotid/submandibular/thyroid glands, pericardium, spinal cord, subcutaneous tissue) and six body part labels (left/right arm/leg, head, torso). Case selection was based on the DICOM series description, gender, and imaging protocol, resulting in 882 patients (438 female) for a total of 900 CTs. Manual review and correction of proposals were conducted in a continuous quality control cycle. Only every fifth axial slice was annotated, yielding 20150 annotated slices from 28 data collections. For the reproducibility on downstream tasks, five cross-validation folds and a test set were pre-defined. The SAROS dataset serves as an open-access resource for training and evaluating novel segmentation models, covering various scanner vendors and diseases.
multidisciplinary sciences
What problem does this paper attempt to address?