GastroVision: A Multi-class Endoscopy Image Dataset for Computer Aided Gastrointestinal Disease Detection

Debesh Jha,Vanshali Sharma,Neethi Dasu,Nikhil Kumar Tomar,Steven Hicks,M.K. Bhuyan,Pradip K. Das,Michael A. Riegler,Pål Halvorsen,Ulas Bagci,Thomas de Lange

2023-08-18

Abstract:Integrating real-time artificial intelligence (AI) systems in clinical practices faces challenges such as scalability and acceptance. These challenges include data availability, biased outcomes, data quality, lack of transparency, and underperformance on unseen datasets from different distributions. The scarcity of large-scale, precisely labeled, and diverse datasets are the major challenge for clinical integration. This scarcity is also due to the legal restrictions and extensive manual efforts required for accurate annotations from clinicians. To address these challenges, we present \textit{GastroVision}, a multi-center open-access gastrointestinal (GI) endoscopy dataset that includes different anatomical landmarks, pathological abnormalities, polyp removal cases and normal findings (a total of 27 classes) from the GI tract. The dataset comprises 8,000 images acquired from Bærum Hospital in Norway and Karolinska University Hospital in Sweden and was annotated and verified by experienced GI endoscopists. Furthermore, we validate the significance of our dataset with extensive benchmarking based on the popular deep learning based baseline models. We believe our dataset can facilitate the development of AI-based algorithms for GI disease detection and classification. Our dataset is available at \url{<a class="link-external link-https" href="https://osf.io/84e7f/" rel="external noopener nofollow">this https URL</a>}.

Image and Video Processing,Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve The paper primarily attempts to address the challenges faced when integrating real-time artificial intelligence (AI) systems into clinical practice, particularly issues such as data availability, data quality, lack of transparency, and poor performance on differently distributed datasets. Specifically, the paper aims to solve these problems through the following aspects: 1. **Data Scarcity**: Large-scale, precisely annotated, and diverse endoscopic image datasets are very scarce, which limits the application of AI technology in clinical settings. Additionally, legal restrictions and the need for extensive manual annotation also increase the difficulty of data acquisition. 2. **Algorithm Bias**: Existing AI models mostly rely on data from a single center, leading to decreased effectiveness when dealing with diverse patient populations. A single, non-blinded endoscopist may introduce personal preferences, resulting in algorithm bias. 3. **Data Collection and Validation**: Bias issues may arise throughout the process from data collection to research design, data preprocessing, algorithm design, and implementation, affecting the final model's performance. To address the above issues, the authors propose a multi-center open-access gastrointestinal endoscopic image dataset named GastroVision, which includes 8,000 images from two hospitals in Norway and Sweden. The dataset covers 27 different anatomical landmarks, pathological abnormalities, polypectomy cases, and normal findings. It is annotated and validated by experienced gastrointestinal endoscopists and has been extensively validated based on popular deep learning benchmark models. The authors believe that this dataset can promote the development of AI-based algorithms for gastrointestinal disease detection and classification.

GastroVision: A Multi-class Endoscopy Image Dataset for Computer Aided Gastrointestinal Disease Detection

HyperKvasir, a comprehensive multi-class image and video dataset for gastrointestinal endoscopy

Real-Time Multi-Label Upper Gastrointestinal Anatomy Recognition from Gastroscope Videos

A comprehensive analysis of classification methods in gastrointestinal endoscopy imaging

PolypDB: A Curated Multi-Center Dataset for Development of AI Algorithms in Colonoscopy

Automated gastrointestinal abnormalities detection from endoscopic images

Deep learning for detection and segmentation of artefact and disease instances in gastrointestinal endoscopy

Preparation of image databases for artificial intelligence algorithm development in gastrointestinal endoscopy

Deep learning-based prediction model for diagnosing gastrointestinal diseases using endoscopy images

Transfer Learning in Endoscopic Imaging: A Machine Vision Approach to GIT Disease Identification

Endoscopy disease detection challenge 2020

REAL-Colon: A dataset for developing real-world AI applications in colonoscopy

Public Imaging Datasets of Gastrointestinal Endoscopy for Artificial Intelligence: a Review

Identification of Upper Gastrointestinal Diseases during Screening Gastroscopy through Deep Convolutional Neural Network Algorithm

S898 Empirical Mode Decomposition Based Deep Learning Algorithm for Gastrointestinal Endoscopic Image Classification

GasHisSDB: A New Gastric Histopathology Image Dataset for Computer Aided Diagnosis of Gastric Cancer

Assessing generalisability of deep learning-based polyp detection and segmentation methods through a computer vision challenge

Efficient disease detection in gastrointestinal videos – global features versus neural networks

Galar - a large multi-label video capsule endoscopy dataset