Abstract:Deep learning-based computer-aided diagnosis techniques have demonstrated encouraging performance in endoscopic lesion identification and detection, and have reduced the rate of missed and false detections of disease during endoscopy. However, the interpretability of the model-based results has not been adequately addressed by existing methods. This phenomenon is directly manifested by a significant bias in the representation of feature localization. Good recognition models experience severe feature localization errors, particularly for lesions with subtle morphological features, and such unsatisfactory performance hinders the clinical deployment of models. To effectively alleviate this problem, we proposed a solution to optimize the localization bias in feature representations of cancer-related recognition models that is difficult to accurately label and identify in clinical practice. Optimization was performed in the training phase of the model through the proposed data augmentation method and auxiliary loss function based on clinical priors. The data augmentation method, called partial jigsaw, can “break” the spatial structure of lesion-independent image blocks and enrich the data feature space to decouple the interference of background features on the space and focus on fine-grained lesion features. The annotation-based auxiliary loss function used class activation maps for sample distribution correction and led the model to present localization representation converging on the gold standard annotation of visualization maps. The results show that with the improvement of our method, the precision of model recognition reached an average of 92.79%, an F1-score of 92.61%, and accuracy of 95.56% based on a dataset constructed from 23 hospitals. In addition, we quantified the evaluation representation of visualization feature maps. The improved model yielded significant offset correction results for visualized feature maps compared with the baseline model. The average visualization-weighted positive coverage improved from 51.85% to 83.76%. The proposed approach did not change the deployment capability and inference speed of the original model and can be incorporated into any state-of-the-art neural network. It also shows the potential to provide more accurate localization inference results and assist in clinical examinations during endoscopies.

MTECC: A Multi-Task Learning Framework for Esophageal Cancer Analysis

Masked Autoencoders with Handcrafted Feature Predictions: Transformer for Weakly Supervised Esophageal Cancer Classification.

Esophageal Squamous Cell Carcinoma Recognition Based on Lightweight Residual Networks with an Attention Mechanism

Multi-label Recognition of Cancer-Related Lesions with Clinical Priors on White-Light Endoscopy

Local-global multiple perception based deep multi-modality learning for sub-type of esophageal cancer classification

EsccNet: A Hybrid CNN and Transformers Model for the Classification of Whole Slide Images of Esophageal Squamous Cell Carcinoma

Predict EGFR Mutation Status on CT Images Using Texture and Contour Enhanced Masked Autoencoders

HC-MAE: Hierarchical Cross-attention Masked Autoencoder Integrating Histopathological Images and Multi-omics for Cancer Survival Prediction.

MECFormer: Multi-task Whole Slide Image Classification with Expert Consultation Network

Wireless Capsule Endoscopy Anomaly Classification Via Dynamic Multi-Task Learning

Ensembled CNN with artificial bee colony optimization method for esophageal cancer stage classification using SVM classifier

Multi-Task Learning With Hierarchical Guidance for Locating and Stratifying Submucosal Tumors

Identification Lymph Node Metastasis in Esophageal Squamous Cell Carcinoma Using Whole Slide Images and a Hybrid Network of Multiple Instance and Transfer Learning

MGCT: Mutual-Guided Cross-Modality Transformer for Survival Outcome Prediction using Integrative Histopathology-Genomic Features

Tu2005 DEVELOPMENT AND VALIDATION OF A DEEP LEARNING ALGORITHM FOR DETECTION AND RECOGNITION OF PRECANCEROUS LESION IN ESOPHAGUS

Three feature streams based on a convolutional neural network for early esophageal cancer identification

A semi-supervised multi-task learning framework for cancer classification with weak annotation in whole-slide images

Segmentation Prompts Classification: A Nnunet-Based 3D Transfer Learning Framework with ROI Tokenization and Cross-Task Attention for Esophageal Cancer T-stage Diagnosis

Ensemble transformer-based multiple instance learning to predict pathological subtypes and tumor mutational burden from histopathological whole slide images of endometrial and colorectal cancer

Deep learning Instance Segmentation on Esophageal Squamous Cell Carcinoma detection

Eso-Net: A Novel 2.5D Segmentation Network with the Multi-Structure Response Filter for the Cancerous Esophagus