Cancer-Net PCa-Data: An Open-Source Benchmark Dataset for Prostate Cancer Clinical Decision Support using Synthetic Correlated Diffusion Imaging Data

Hayden Gunraj,Chi-en Amy Tai,Alexander Wong
2023-11-20
Abstract:The recent introduction of synthetic correlated diffusion (CDI$^s$) imaging has demonstrated significant potential in the realm of clinical decision support for prostate cancer (PCa). CDI$^s$ is a new form of magnetic resonance imaging (MRI) designed to characterize tissue characteristics through the joint correlation of diffusion signal attenuation across different Brownian motion sensitivities. Despite the performance improvement, the CDI$^s$ data for PCa has not been previously made publicly available. In our commitment to advance research efforts for PCa, we introduce Cancer-Net PCa-Data, an open-source benchmark dataset of volumetric CDI$^s$ imaging data of PCa patients. Cancer-Net PCa-Data consists of CDI$^s$ volumetric images from a patient cohort of 200 patient cases, along with full annotations (gland masks, tumor masks, and PCa diagnosis for each tumor). We also analyze the demographic and label region diversity of Cancer-Net PCa-Data for potential biases. Cancer-Net PCa-Data is the first-ever public dataset of CDI$^s$ imaging data for PCa, and is a part of the global open-source initiative dedicated to advancement in machine learning and imaging research to aid clinicians in the global fight against cancer.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper mainly addresses the following issues: 1. **Introduction of New Imaging Technology**: The paper introduces a new technology called Synthetic Correlated Diffusion Imaging (CDIs), which characterizes tissue properties by combining diffusion signal attenuation under different Brownian motion sensitivities. Compared to traditional Magnetic Resonance Imaging (MRI) techniques such as T2-weighted imaging (T2w), Diffusion-Weighted Imaging (DWI), and Dynamic Contrast-Enhanced Imaging (DCE), CDIs demonstrate stronger capabilities in clinical decision support for Prostate Cancer (PCa). 2. **Public Dataset**: Despite the significant potential of CDIs technology, no related patient data had been publicly available before. To advance prostate cancer research, the authors released Cancer-Net PCa-Data, the first publicly available benchmark dataset containing volumetric CDIs imaging data of prostate cancer patients. This dataset consists of CDIs volumetric images from 200 patients, accompanied by complete annotation information (including gland masks, tumor masks, and prostate cancer diagnosis results for each tumor). 3. **Data Analysis**: The paper also conducts a demographic and label region diversity analysis of the Cancer-Net PCa-Data dataset to assess potential biases. The analysis results show that patients aged 60-69 account for the highest proportion (56%), while patients under 50 are relatively few, indicating an uneven age distribution in the dataset. Additionally, the number of clinically significant tumors in the dataset is much lower than that of clinically insignificant tumors, with a ratio of approximately 1:3. 4. **Potential Negative Social Impact**: The paper discusses the potential negative social impacts of the dataset, including data misuse and over-reliance on models trained on this dataset. To avoid these issues, it is recommended that any models trained using this dataset should be validated on real-world clinical data and require expert supervision. In summary, the main purpose of this paper is to promote research progress in the field of prostate cancer by releasing the Cancer-Net PCa-Data dataset and to highlight its potential limitations and biases through detailed analysis, enabling researchers to better utilize these resources.