Knowledge Extraction and Distillation from Large-Scale Image-Text Colonoscopy Records Leveraging Large Language and Vision Models

Shuo Wang,Yan Zhu,Xiaoyuan Luo,Zhiwei Yang,Yizhe Zhang,Peiyao Fu,Manning Wang,Zhijian Song,Quanlin Li,Pinghong Zhou,Yike Guo
2023-10-17
Abstract:The development of artificial intelligence systems for colonoscopy analysis often necessitates expert-annotated image datasets. However, limitations in dataset size and diversity impede model performance and generalisation. Image-text colonoscopy records from routine clinical practice, comprising millions of images and text reports, serve as a valuable data source, though annotating them is labour-intensive. Here we leverage recent advancements in large language and vision models and propose EndoKED, a data mining paradigm for deep knowledge extraction and distillation. EndoKED automates the transformation of raw colonoscopy records into image datasets with pixel-level annotation. We validate EndoKED using multi-centre datasets of raw colonoscopy records (~1 million images), demonstrating its superior performance in training polyp detection and segmentation models. Furthermore, the EndoKED pre-trained vision backbone enables data-efficient and generalisable learning for optical biopsy, achieving expert-level performance in both retrospective and prospective validation.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The paper aims to address the challenge of data annotation in AI systems for polyp detection and diagnosis in colonoscopy. Specifically, the paper proposes a paradigm called EndoKED (Knowledge Extraction and Distillation), which leverages large-scale image-text colonoscopy records to automatically create datasets with pixel-level annotations. The main goal is to reduce the cost of manual annotation, improve model performance and generalization ability, and effectively handle tasks such as polyp detection, segmentation, and optical biopsy. The paper achieves this goal through the following points: 1. **Knowledge Extraction**: Utilizing large language models (LLM) to extract lesion labels at the report level from free-text reports. 2. **Multiple Instance Learning**: Converting report-level labels into image-level labels to locate images that may contain lesions. 3. **Weakly Supervised Learning**: Further converting image-level labels into pixel-level masks for training segmentation models. 4. **Pre-trained Visual Models**: Using visual models pre-trained on large-scale colonoscopy records to achieve efficient and generalizable learning, especially reaching expert-level performance in optical biopsy tasks. In summary, this research aims to develop an automated method that can effectively utilize the vast amount of existing unannotated colonoscopy records, thereby significantly reducing the cost and time required to train high-quality AI systems.