Revisiting Distillation for Continual Learning on Visual Question Localized-Answering in Robotic Surgery

Long Bai,Mobarakol Islam,Hongliang Ren

2023-07-22

Abstract:The visual-question localized-answering (VQLA) system can serve as a knowledgeable assistant in surgical education. Except for providing text-based answers, the VQLA system can highlight the interested region for better surgical scene understanding. However, deep neural networks (DNNs) suffer from catastrophic forgetting when learning new knowledge. Specifically, when DNNs learn on incremental classes or tasks, their performance on old tasks drops dramatically. Furthermore, due to medical data privacy and licensing issues, it is often difficult to access old data when updating continual learning (CL) models. Therefore, we develop a non-exemplar continual surgical VQLA framework, to explore and balance the rigidity-plasticity trade-off of DNNs in a sequential learning paradigm. We revisit the distillation loss in CL tasks, and propose rigidity-plasticity-aware distillation (RP-Dist) and self-calibrated heterogeneous distillation (SH-Dist) to preserve the old knowledge. The weight aligning (WA) technique is also integrated to adjust the weight bias between old and new tasks. We further establish a CL framework on three public surgical datasets in the context of surgical settings that consist of overlapping classes between old and new surgical VQLA tasks. With extensive experiments, we demonstrate that our proposed method excellently reconciles learning and forgetting on the continual surgical VQLA over conventional CL methods. Our code is publicly accessible.

Computer Vision and Pattern Recognition,Computation and Language,Robotics

What problem does this paper attempt to address?

This paper attempts to address the problem of Continual Learning (CL) in Visual Question Localization and Answering (VQLA) systems in robotic surgery, particularly how to avoid catastrophic forgetting of old knowledge while continuously learning new tasks. Specifically, the paper focuses on the following points: 1. **Catastrophic Forgetting**: When Deep Neural Networks (DNNs) learn new tasks or categories, their performance on old tasks significantly declines. This is especially severe in the medical field, where old data may be inaccessible due to privacy, storage, and licensing issues. 2. **Handling Overlapping Categories**: In practical applications, there may be overlapping categories between new and old tasks. Traditional continual learning methods may bias towards old categories when handling these overlapping categories, leading to poor learning outcomes for new categories. 3. **Multi-task Learning**: The VQLA system not only needs to provide textual answers but also highlight areas of interest to better understand the surgical scene. Therefore, the system needs to handle both classification and localization tasks simultaneously. To address these issues, the paper proposes a non-exemplary continual surgical VQLA framework (CS-VQLA) and introduces the following methods by revisiting distillation loss: - **Rigidity-Plasticity-Aware Distillation (RP-Dist)**: By adjusting the temperature parameter, the model achieves higher plasticity on overlapping categories while maintaining rigidity on non-overlapping categories. - **Self-Calibrated Heterogeneous Distillation (SH-Dist)**: Self-calibration operations are performed on intermediate feature maps to adapt to long-range contextual information. - **Weight Alignment (WA)**: Adjusts the weight bias between new and old categories to prevent the model from biasing towards new categories. Through these methods, the paper demonstrates that the proposed approach performs excellently in continual learning tasks on multiple public surgical datasets, effectively balancing the issues of learning and forgetting.

Revisiting Distillation for Continual Learning on Visual Question Localized-Answering in Robotic Surgery

LLM-Assisted Multi-Teacher Continual Learning for Visual Question Answering in Robotic Surgery

Surgical-VQLA++: Adversarial Contrastive Learning for Calibrated Robust Visual Question-Localized Answering in Robotic Surgery

Privacy-Preserving Synthetic Continual Semantic Segmentation for Robotic Surgery

Surgical-VQLA: Transformer with Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery

Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery

CAT-ViL: Co-Attention Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery

MHKD-MVQA: Multimodal Hierarchical Knowledge Distillation for Medical Visual Question Answering.

Enhancing Continual Learning in Visual Question Answering with Modality-Aware Feature Distillation

Self-Knowledge Distillation for Surgical Phase Recognition

Cross-domain visual prompting with spatial proximity knowledge distillation for histological image classification

Prior-Posterior Knowledge Prompting-and-Reasoning for Surgical Visual Question Localized-Answering

Candidate-Heuristic In-Context Learning: A new framework for enhancing medical visual question answering with LLMs

Brain-Inspired Continual Learning: Robust Feature Distillation and Re-Consolidation for Class Incremental Learning

Online Distillation with Continual Learning for Cyclic Domain Shifts

Surgical-VQA: Visual Question Answering in Surgical Scenes using Transformer

Reprogramming Distillation for Medical Foundation Models

Brain-Inspired Continual Learning-Robust Feature Distillation and Re-Consolidation for Class Incremental Learning

Self-distillation for surgical action recognition