A Simple Framework for Depth-Augmented Contrastive Learning for Endoscopic Image Classification

Weihao Weng,Xin Zhu,Faouzi Alaya Cheikh,Mohib Ullah,Mitsuyoshi Imaizumi,Shigeyuki Murono,Satoshi Kubota
DOI: https://doi.org/10.1109/tim.2024.3470015
IF: 5.6
2024-10-16
IEEE Transactions on Instrumentation and Measurement
Abstract:This article introduces a simple framework for depth-augmented contrastive learning (SimDCL), a novel approach to enhance endoscopic image classification by incorporating depth information. Unlike traditional methods that struggle with the absence of depth in 2-D endoscopic images, SimDCL leverages a depth estimation technique trained exclusively on da Vinci Xi endoscope data. This method not only addresses the challenge of obtaining accurate depth data for regions like the pharynges or larynges but also presents the information in a manner that aligns with medical professionals' expertise. Specifically, we designed a loss function for self-supervised depth estimation (SSDE), which performs well when trained on public datasets and then applied to data without depth information. In addition, we developed an augmentation method and corresponding loss function that utilize this depth information to improve the accuracy of endoscopic image classification. The evaluation involved a private dataset of 199 flexible endoscopic evaluation of swallowing (FEES) video images for training and 40 independent FEES video images for testing, along with two public datasets (Nerthus and Kvasir). SimDCL achieved an accuracy of 73.0% (72.7% for Nerthus and 81.6% for Kvasir), surpassing the performance of existing methods (CCSSL, CoMatch, and FixMatch) by margins (9.2%, 12.1%, and 17.8% for FEES, 9.82%, 11.33%, and 11.67% for Nerthus, and 4.21%, 5.42%, and 9.97% for Kvasir, respectively).
engineering, electrical & electronic,instruments & instrumentation
What problem does this paper attempt to address?