Abstract:Large Language Models (LLMs) have demonstrated impressive capabilities in complex reasoning tasks. However, they can be easily misled by unfaithful arguments during conversations, even when their original statements are correct. To this end, we investigate the problem of maintaining faithful integrity in LLMs. This involves ensuring that LLMs adhere to their faithful statements in the face of opposing arguments and are able to correct their incorrect statements when presented with faithful arguments. In this work, we propose a novel framework, named Alignment for Faithful Integrity with Confidence Estimation (AFICE), which aims to align the LLM responses with faithful integrity. Specifically, AFICE first designs a Bilateral Confidence Estimation (BCE) approach for estimating the uncertainty of each response generated by the LLM given a specific context, which simultaneously estimate the model's confidence to the question based on the internal states during decoding as well as to the answer based on cumulative probability ratios. With the BCE, we construct a conversational preference dataset composed of context, original statement, and argument, which is adopted for aligning the LLM for faithful integrity using Direct Preference Optimization (DPO). Extensive experimental results on a wide range of benchmarks demonstrate significant improvements in the LLM's ability to maintain faithful responses when encountering opposing arguments, ensuring both the practical utility and trustworthiness of LLMs in complex interactive settings. Code and data will be released via <a class="link-external link-https" href="https://github.com/zhaoy777/AFICE.git" rel="external noopener nofollow">this https URL</a>

Towards Faithful Dialogues Via Focus Learning.

Blending Reward Functions via Few Expert Demonstrations for Faithful and Accurate Knowledge-Grounded Dialogue Generation

Small Changes Make Big Differences: Improving Multi-turn Response Selection in Dialogue Systems Via Fine-Grained Contrastive Learning

FastLearn: A Rapid Learning Agent for Chat Models to Acquire Latest Knowledge

Learning Multi-turn Response Selection in Grounded Dialogues with Reinforced Knowledge and Context Distillation

Knowledge-Grounded Dialogue with Reward-Driven Knowledge Selection

Mitigating Large Language Model Hallucination with Faithful Finetuning

Learning to Express in Knowledge-Grounded Conversation

SparsePO: Controlling Preference Alignment of LLMs via Sparse Token Masks

Token-level Direct Preference Optimization

Context-DPO: Aligning Language Models for Context-Faithfulness

Diverse and Faithful Knowledge-Grounded Dialogue Generation via Sequential Posterior Inference

RA2FD: Distilling Faithfulness into Efficient Dialogue Systems

Exploring Dense Retrieval for Dialogue Response Selection

Enhancing Dialogue Generation Via Multi-Level Contrastive Learning.

UFI4ER: an Utterance-Level Feature Dynamic Interaction Model for Cognition-Enhanced Empathetic Response Generation

A Knowledge Driven Dialogue Model with Reinforcement Learning

Aligning Large Language Models for Faithful Integrity Against Opposing Argument

Focused Large Language Models are Stable Many-Shot Learners

Continue or SHIFT: Learning Conversational Patterns for Dialogue Generation

Direct Preference Optimization Using Sparse Feature-Level Constraints