Abstract:Learning better representations is essential in medical image analysis for computer-aided diagnosis. However, learning discriminative semantic features is a major challenge due to the lack of large-scale well-annotated datasets. Thus, how can we learn a well-structured categorizable embedding space in limited-scale and unlabeled datasets? In this paper, we proposed a novel clustering-guided twin-contrastive learning framework (CTCL) that learns the discriminative representations of probe-based confocal laser endomicroscopy (pCLE) images for gastrointestinal (GI) tumor classification. Compared with traditional contrastive learning, in which only two randomly augmented views of the same instance are considered, the proposed CTCL aligns more semantically related and class-consistent samples by clustering, which improved intra-class tightness and inter-class variability to produce more informative representations. Furthermore, based on the inherent properties of CLE (geometric invariance and intrinsic noise), we proposed to regard CLE images with any angle rotation and CLE images with different noises as the same instance, respectively, for increased variability and diversity of samples. By optimizing CTCL in an end-to-end expectation-maximization framework, comprehensive experimental results demonstrated that CTCL-based visual representations achieved competitive performance on each downstream task as well as more robustness and transferability compared with existing state-of-the-art SSL and supervised methods. Notably, CTCL achieved 75.60%/78.45% and 64.12%/77.37% top-1 accuracy on the linear evaluation protocol and few-shot classification downstream tasks, respectively, which outperformed the previous best results by 1.27%/1.63% and 0.5%/3%, respectively. The proposed method holds great potential to assist pathologists in achieving an automated, fast, and high-precision diagnosis of GI tumors and accurately determining different stages of tumor development based on CLE images.

A Simple Framework for Depth-Augmented Contrastive Learning for Endoscopic Image Classification

A Three-Dimensional Measurement Method for Binocular Endoscopes Based on Deep Learning

Self-supervised endoscopy depth estimation framework with CLIP-guidance segmentation

Clustering-Guided Twin Contrastive Learning for Endomicroscopy Image Classification

SSL-CPCD: Self-supervised learning with composite pretext-class discrimination for improved generalisability in endoscopic image analysis

Image Intrinsic-Based Unsupervised Monocular Depth Estimation in Endoscopy

EndoDAC: Efficient Adapting Foundation Model for Self-Supervised Depth Estimation from Any Endoscopic Camera

Tackling Challenges of Low-texture and Illumination Variations for Endoscopy Self-supervised Monocular Depth Estimation

Self-Supervised Monocular Depth Estimation for Endoscopic Imaging

Improving the Classification Performance of Esophageal Disease on Small Dataset by Semi-supervised Efficient Contrastive Learning

A geometry-aware deep network for depth estimation in monocular endoscopy

Classification of endoscopic image and video frames using distance metric-based learning with interpolated latent features

SMUDLP: Self-Teaching Multi-Frame Unsupervised Endoscopic Depth Estimation with Learnable Patchmatch

Few-shot learning for the classification of intestinal tuberculosis and Crohn's disease on endoscopic images: A novel learn-to-learn framework

Advancing Depth Anything Model for Unsupervised Monocular Depth Estimation in Endoscopy

Self-supervised monocular depth estimation for gastrointestinal endoscopy

MonoLoT: Self-Supervised Monocular Depth Estimation in Low-Texture Scenes for Automatic Robotic Endoscopy

Multi-Scale Structural-aware Exposure Correction for Endoscopic Imaging

SLAM Endoscopy enhanced by adversarial depth prediction

A Simple Framework Uniting Visual In-context Learning with Masked Image Modeling to Improve Ultrasound Segmentation