Abstract:Large Language Models (LLMs) have catalyzed transformative advances across a spectrum of natural language processing tasks through few-shot or zero-shot prompting, bypassing the need for parameter tuning. While convenient, this modus operandi aggravates ``hallucination'' concerns, particularly given the enigmatic ``black-box'' nature behind their gigantic model sizes. Such concerns are exacerbated in high-stakes applications (e.g., healthcare), where unaccountable decision errors can lead to devastating consequences. In contrast, human decision-making relies on nuanced cognitive processes, such as the ability to sense and adaptively correct misjudgments through conceptual understanding. Drawing inspiration from human cognition, we propose an innovative \textit{metacognitive} approach, dubbed \textbf{CLEAR}, to equip LLMs with capabilities for self-aware error identification and correction. Our framework facilitates the construction of concept-specific sparse subnetworks that illuminate transparent decision pathways. This provides a novel interface for model \textit{intervention} after deployment. Our intervention offers compelling advantages: (\textit{i})~at deployment or inference time, our metacognitive LLMs can self-consciously identify potential mispredictions with minimum human involvement, (\textit{ii})~the model has the capability to self-correct its errors efficiently, obviating the need for additional tuning, and (\textit{iii})~the rectification procedure is not only self-explanatory but also user-friendly, enhancing the interpretability and accessibility of the model. By integrating these metacognitive features, our approach pioneers a new path toward engendering greater trustworthiness and accountability in the deployment of LLMs.

What problem does this paper attempt to address?

This paper attempts to address the problem of error identification and correction in large - language models (LLMs) after deployment. Specifically, although LLMs have made significant progress in natural - language - processing tasks, their "black - box" nature and potential "hallucination" problems lead to unreliable decision - making, especially in high - risk applications such as healthcare, where this unreliability can have serious consequences. Therefore, the paper proposes a metacognitive method named CLEAR, aiming to enable LLMs to have self - aware error - identification and - correction capabilities, thereby enhancing the model's transparency, interpretability, and reliability. ### Main contributions of the paper: 1. **Metacognition**: During the deployment or inference stage, the framework can autonomously detect potential mispredictions by measuring the logit entropy of key intermediate layers. 2. **Interpretability**: By using the transparency of the decision path, users can logically trace back to the input, enhancing trust in the model. 3. **Efficiency**: Once a misprediction is identified, the model architecture will dynamically activate additional internal experts to optimize concept perception without further parameter tuning. 4. **Effectiveness**: Experiments on multiple real - world datasets show that this intervention method can consistently improve prediction accuracy during inference time across LLMs of different sizes and architectures. ### Problems Solved: - **Black - box nature**: Traditional LLMs are difficult to be intervened in a targeted manner due to their complex internal structures, making it difficult to locate the source of errors. - **Dependence on experts**: Current error - identification and - correction methods usually require the participation of domain experts, which limits the scalability and automation of the methods. - **Model complexity**: The large architecture of LLMs makes targeted intervention difficult. By introducing the metacognitive mechanism, the paper provides a new solution that enables LLMs to automatically identify and correct errors after deployment, reducing the dependence on human experts and improving the model's transparency and reliability.

Tuning-Free Accountable Intervention for LLM Deployment -- A Metacognitive Approach

On the Intrinsic Self-Correction Capability of LLMs: Uncertainty and Latent Concept

Efficiently Deploying LLMs with Controlled Risk

Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing

Deconfounded Causality-aware Parameter-Efficient Fine-Tuning for Problem-Solving Improvement of LLMs

LLMs-as-Instructors: Learning from Errors Toward Automating Model Improvement

Know the Unknown: An Uncertainty-Sensitive Method for LLM Instruction Tuning

LLaCA: Multimodal Large Language Continual Assistant

Metacognitive Myopia in Large Language Models

Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation

Adaptation with Self-Evaluation to Improve Selective Prediction in LLMs

LLMAuditor: A Framework for Auditing Large Language Models Using Human-in-the-Loop

Enhanced Language Model Truthfulness with Learnable Intervention and Uncertainty Expression

Optimizing Large Language Models for Dynamic Constraints through Human-in-the-Loop Discriminators

Optimizing Psychological Counseling with Instruction-Tuned Large Language Models

CMAT: A Multi-Agent Collaboration Tuning Framework for Enhancing Small Language Models

Enhancing Healthcare LLM Trust with Atypical Presentations Recalibration

Introspective Tips: Large Language Model for In-Context Decision Making

R-Tuning: Instructing Large Language Models to Say `I Don't Know'

Using Advanced LLMs to Enhance Smaller LLMs: An Interpretable Knowledge Distillation Approach