NuInsSeg: A Fully Annotated Dataset for Nuclei Instance Segmentation in H&E-Stained Histological Images

Amirreza Mahbod,Christine Polak,Katharina Feldmann,Rumsha Khan,Katharina Gelles,Georg Dorffner,Ramona Woitek,Sepideh Hatamikia,Isabella Ellinger
2023-08-03
Abstract:In computational pathology, automatic nuclei instance segmentation plays an essential role in whole slide image analysis. While many computerized approaches have been proposed for this task, supervised deep learning (DL) methods have shown superior segmentation performances compared to classical machine learning and image processing techniques. However, these models need fully annotated datasets for training which is challenging to acquire, especially in the medical domain. In this work, we release one of the biggest fully manually annotated datasets of nuclei in Hematoxylin and Eosin (H&E)-stained histological images, called NuInsSeg. This dataset contains 665 image patches with more than 30,000 manually segmented nuclei from 31 human and mouse organs. Moreover, for the first time, we provide additional ambiguous area masks for the entire dataset. These vague areas represent the parts of the images where precise and deterministic manual annotations are impossible, even for human experts. The dataset and detailed step-by-step instructions to generate related segmentation masks are publicly available at <a class="link-external link-https" href="https://www.kaggle.com/datasets/ipateam/nuinsseg" rel="external noopener nofollow">this https URL</a> and <a class="link-external link-https" href="https://github.com/masih4/NuInsSeg" rel="external noopener nofollow">this https URL</a>, respectively.
Image and Video Processing,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper attempts to address the application of automatic nuclei instance segmentation in Whole Slide Images (WSI) analysis in computational pathology. Although many computerized methods have been proposed for this task, supervised Deep Learning (DL) methods are favored due to their superior segmentation performance compared to traditional machine learning and image processing techniques. However, these deep learning models require fully annotated datasets for training, which is particularly challenging in the medical field. To tackle this challenge, the authors have released a dataset named NuInsSeg, which is one of the largest manually fully annotated nuclei instance segmentation datasets, specifically targeting Hematoxylin and Eosin (H&E) stained histological images. This dataset contains 665 image patches from 31 human and mouse organs, with over 30,000 manually segmented nuclei. Additionally, for the first time, the dataset provides fuzzy region masks for the entire dataset, representing image parts that even human experts cannot annotate precisely and deterministically. By releasing this dataset, the researchers aim to provide resources for the development, testing, and evaluation of machine learning algorithms for nuclei instance segmentation, while also offering an independent test set to estimate the generalization capability of existing nuclei instance segmentation methods. This not only helps improve the accuracy of nuclei instance segmentation but also promotes research progress in related fields.