PolypDB: A Curated Multi-Center Dataset for Development of AI Algorithms in Colonoscopy

Debesh Jha,Nikhil Kumar Tomar,Vanshali Sharma,Quoc-Huy Trinh,Koushik Biswas,Hongyi Pan,Ritika K. Jha,Gorkem Durak,Alexander Hann,Jonas Varkey,Hang Viet Dao,Long Van Dao,Binh Phuc Nguyen,Khanh Cong Pham,Quang Trung Tran,Nikolaos Papachrysos,Brandon Rieders,Peter Thelin Schmidt,Enrik Geissler,Tyler Berzin,Pål Halvorsen,Michael A. Riegler,Thomas de Lange,Ulas Bagci

2024-08-19

Abstract:Colonoscopy is the primary method for examination, detection, and removal of polyps. Regular screening helps detect and prevent colorectal cancer at an early curable stage. However, challenges such as variation among the endoscopists' skills, bowel quality preparation, and complex nature of the large intestine which cause large number of polyp miss-rate. These missed polyps can develop into cancer later on, which underscores the importance of improving the detection methods. A computer-aided diagnosis system can support physicians by assisting in detecting overlooked polyps. However, one of the important challenges for developing novel deep learning models for automatic polyp detection and segmentation is the lack of publicly available, multi-center large and diverse datasets. To address this gap, we introduce PolypDB, a large scale publicly available dataset that contains 3934 still polyp images and their corresponding ground truth from real colonoscopy videos to design efficient polyp detection and segmentation architectures. The dataset has been developed and verified by a team of 10 gastroenterologists. PolypDB comprises of images from five modalities: Blue Light Imaging (BLI), Flexible Imaging Color Enhancement (FICE), Linked Color Imaging (LCI), Narrow Band Imaging (NBI), and White Light Imaging (WLI) and three medical centers from Norway, Sweden and Vietnam. Thus, we split the dataset based on modality and medical center for modality-wise and center-wise analysis. We provide a benchmark on each modality using eight popular segmentation methods and six standard benchmark polyp detection methods. Furthermore, we also provide benchmark on center-wise under federated learning settings. Our dataset is public and can be downloaded at \url{<a class="link-external link-https" href="https://osf.io/pr7ms/" rel="external noopener nofollow">this https URL</a>}.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the challenge of polyp detection in colonoscopy. Specifically: 1. **Improve the polyp detection rate**: Colonoscopy is the main method for the prevention and early detection of colorectal cancer (CRC), but in actual operation, there is a relatively high missed - detection rate. This is mainly due to the differences in the technical levels of endoscopists, the quality of bowel preparation, and the complex structure of the colon. These missed polyps may develop into cancer, so it is crucial to improve the accuracy of the detection method. 2. **Lack of multi - center, large - scale, and diverse public datasets**: An important challenge in developing new deep - learning models for automatic polyp detection and segmentation is the lack of publicly available, multi - center, large - scale, and diverse datasets. The existing datasets are either small in scale or lack diversity and cannot fully train and validate the generalization ability of the model. To solve these problems, the paper introduces **PolypDB**, which is a large - scale public dataset containing 3,934 polyp images from real colonoscopy videos and their corresponding ground - truth annotations. This dataset was developed and verified by 10 gastroenterologists from three medical centers in Norway, Sweden, and Vietnam. The dataset covers five imaging modalities (blue - light imaging (BLI), flexible spectral imaging color enhancement (FICE), linked - color imaging (LCI), narrow - band imaging (NBI), and white - light imaging (WLI)), aiming to support the design of efficient polyp detection and segmentation architectures. By providing such a comprehensive dataset, the paper hopes to promote the development of computer - aided diagnosis systems, thereby improving the diagnostic performance of colonoscopy, reducing the missed - detection rate, and ultimately reducing the incidence and mortality of colorectal cancer.

PolypDB: A Curated Multi-Center Dataset for Development of AI Algorithms in Colonoscopy

A multi-centre polyp detection and segmentation dataset for generalisability assessment

Colonoscopy polyp detection and classification: Dataset creation and comparative evaluations

Assessing generalisability of deep learning-based polyp detection and segmentation methods through a computer vision challenge

Validating polyp and instrument segmentation methods in colonoscopy through Medico 2020 and MedAI 2021 Challenges

Colorectal Polyp Detection in Real-world Scenario: Design and Experiment Study

Flexible Colon Polyp Detection: A Dual Mode Approach for Detection and Segmentation of Colon Polyps with Optional Inpainting for Specular Highlight Mitigation

Modified DeeplabV3+ with multi-level context attention mechanism for colonoscopy polyp segmentation

Establishment and validation of a computer-assisted colonic polyp localization system based on deep learning

An Ensemble Framework of Deep Neural Networks for Colorectal Polyp Classification

A complete benchmark for polyp detection, segmentation and classification in colonoscopy images

A Real-Time Polyp-Detection System with Clinical Application in Colonoscopy Using Deep Convolutional Neural Networks

IDDF2018-ABS-0260 Deep Learning for Polyp Segmentation

COLON: The largest COlonoscopy LONg sequence public database

Deep learning to find colorectal polyps in colonoscopy: A systematic literature review

Development and validation of a deep-learning algorithm for the detection of polyps during colonoscopy

IDDF2018-ABS-0259 Segmentation of Intestinal Polyps Via a Deep Learning Algorithm

Automatic Polyp Detection in Colonoscopy Images: Convolutional Neural Network, Dataset and Transfer Learning

PolypConnect: Image inpainting for generating realistic gastrointestinal tract images with polyps

Prototype Learning for Out-of-Distribution Polyp Segmentation

Assessing clinical efficacy of polyp detection models using open-access datasets