CellBinDB: A Large-Scale Multimodal Annotated Dataset for Cell Segmentation with Benchmarking of Universal Models

Can Shi,Jinghong Fan,Zhonghan Deng,Huanlin Liu,Qiang Kang,Yumei Li,Jing Guo,Jingwen Wang,Jinjiang Gong,Sha Liao,Ao Chen,Ying Zhang,Mei Li
DOI: https://doi.org/10.1101/2024.11.20.619750
2024-11-21
Abstract:In recent years, advances in cell segmentation techniques have played a critical role in the analysis of biological images, especially for quantitative studies. Deep learning models have shown remarkable performance in segmenting cell and nucleus boundaries, but are often designed for specific modalities or require human intervention to select hyper-parameters, and are limited in generalizing to out-of-sample data. Building universal cell segmentation models can address the above challenges, but requires a large amount of multimodal training data. Here, we present CellBinDB, a large-scale multimodal annotated dataset established for cell segmentation. CellBinDB contains more than 1,000 annotated images of DAPI, ssDNA, H&E, and mIF staining, covering more than 30 normal and diseased tissue types from human and mouse samples. Based on CellBinDB, we benchmarked six state-of-the-art cell segmentation models and a widely used software. Evaluations were performed on the entire dataset and on each staining type, with Cellpose performed outstandingly. In addition, we analyzed the effects of four cell morphology indicators and image gradient on the segmentation results.
Biology
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the lack of generality in existing cell segmentation techniques. Specifically: 1. **Limitations of existing datasets**: Currently available datasets are limited in terms of scale, staining techniques, and tissue type diversity, which restricts the development of general - purpose cell segmentation models. For example, many existing datasets only contain a few staining types or specific types of tissue samples and cannot meet the need for developing widely applicable cell segmentation models. 2. **Generality and robustness of models**: Existing cell segmentation models are usually designed for specific modalities, require manual intervention to select hyper - parameters, and perform poorly when dealing with unseen data. These models show inconsistent performance across different staining types and tissue types and lack generality and robustness. To address these issues, the paper introduces **CellBinDB**, a large - scale multi - modal annotated dataset aimed at promoting the development of general - purpose cell segmentation models. CellBinDB contains more than 1,000 annotated images, covering four staining types: DAPI, ssDNA, H&E, and mIF, and involves more than 30 normal and diseased tissue types in humans and mice. Based on CellBinDB, the researchers benchmarked six state - of - the - art cell segmentation models and a widely used software, evaluated their performance on different staining types, and analyzed the influence of cell morphological indicators and image gradients on the segmentation results. Through these efforts, the paper hopes to promote the development of cell segmentation techniques, improve the generality and robustness of models, and thus better support quantitative analysis in biomedical research.