NuInsSeg: A fully annotated dataset for nuclei instance segmentation in H&E-stained histological images

Amirreza Mahbod,Christine Polak,Katharina Feldmann,Rumsha Khan,Katharina Gelles,Georg Dorffner,Ramona Woitek,Sepideh Hatamikia,Isabella Ellinger
DOI: https://doi.org/10.1038/s41597-024-03117-2
2024-03-15
Scientific Data
Abstract:In computational pathology, automatic nuclei instance segmentation plays an essential role in whole slide image analysis. While many computerized approaches have been proposed for this task, supervised deep learning (DL) methods have shown superior segmentation performances compared to classical machine learning and image processing techniques. However, these models need fully annotated datasets for training which is challenging to acquire, especially in the medical domain. In this work, we release one of the biggest fully manually annotated datasets of nuclei in Hematoxylin and Eosin (H&E)-stained histological images, called NuInsSeg. This dataset contains 665 image patches with more than 30,000 manually segmented nuclei from 31 human and mouse organs. Moreover, for the first time, we provide additional ambiguous area masks for the entire dataset. These vague areas represent the parts of the images where precise and deterministic manual annotations are impossible, even for human experts. The dataset and detailed step-by-step instructions to generate related segmentation masks are publicly available on the respective repositories.
multidisciplinary sciences
What problem does this paper attempt to address?
The paper attempts to address the challenge of applying automatic nuclei instance segmentation in whole slide image analysis within computational pathology. Although numerous computerized methods have been proposed to accomplish this task, supervised deep learning (DL) methods have shown outstanding performance in segmentation. However, these models require fully annotated datasets for training, which is particularly challenging in the medical field. Specifically, this study aims to release a large-scale, fully manually annotated nuclei instance segmentation dataset named NuInsSeg, which includes 665 image patches from 31 human and mouse organs, with over 30,000 manually segmented nuclei. Additionally, for the first time, the dataset provides fuzzy region masks for the entire dataset. These fuzzy regions represent parts of the images that are difficult to annotate precisely and definitively even for human experts. The annotation of these fuzzy regions is potentially useful for in-depth analysis and evaluation of any automatic nuclei instance segmentation model. Overall, the paper aims to advance research in the field of nuclei instance segmentation by providing a high-quality, large-scale manually annotated dataset, thereby improving the performance and generalization capability of automatic segmentation models.