DSCC: a Data Set of Cervical Cell Images for Cervical Cytology Screening

Hua Chen,Juan Liu,Yu Jin,Baochuan Pang,Dehua Cao,Di Xiao
DOI: https://doi.org/10.1504/ijdmb.2022.130325
2022-01-01
International Journal of Data Mining and Bioinformatics
Abstract:The lack of large-scale public datasets aiming for cytological screening of cervical cancer has hindered the research of developing robust cytological screening models. To address this problem, we develop a dataset DSCC containing 15,509 cervical cell images labelled by experienced cytologists. As far as we know, the number of cell images in DSCC is nearly four times that of the largest data set known at present. Considering that the purpose of cytological screening is not for cancer diagnosis, but for judging whether the subject needs further examination, we classify the cell images into three categories: Normal, SIL (squamous intra-epithelial lesion or cancer cell, suggesting further examination), ASC (atypical squamous cell, needing to be confirmed by a professional cytologist). Furthermore, we also provide a nucleus mask map for each cell based on the annotation of the cytologists, to facilitate researchers to conduct different studies. Based on the mask map, we extract 78 features for each cell that are included in the data set as well. Experiments results demonstrate that DSCC is very useful for researchers to build classification methods for automatic cervical cytology screening.
What problem does this paper attempt to address?