Abstract:This paper presents a method to evaluate the alignment between the decision-making logic of Large Language Models (LLMs) and human cognition in a case study on legal LLMs. Unlike traditional evaluations on language generation results, we propose to evaluate the correctness of the detailed decision-making logic of an LLM behind its seemingly correct outputs, which represents the core challenge for an LLM to earn human trust. To this end, we quantify the interactions encoded by the LLM as primitive decision-making logic, because recent theoretical achievements have proven several mathematical guarantees of the faithfulness of the interaction-based explanation. We design a set of metrics to evaluate the detailed decision-making logic of LLMs. Experiments show that even when the language generation results appear correct, a significant portion of the internal inference logic contains notable issues.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to evaluate the alignment between the decision logic of large language models (LLMs) and human cognition in the legal domain. Specifically, the authors propose a method to quantify the correctness of the internal reasoning logic of LLMs, rather than merely assessing the correctness of their generated language outputs. This represents a core challenge for LLMs to gain human trust. ### Main Issues and Background 1. **Trust and Safety**: The credibility and safety of large language models in high-risk tasks are significant issues. Traditional evaluation methods mainly focus on the correctness of generated outputs but overlook the detailed decision logic behind them. 2. **Case Study in the Legal Domain**: The authors choose LLMs in the legal domain as a case study because, even if the generated outputs are correct, legal LLMs may make judgments based on significantly erroneous information. 3. **Importance of Alignment**: The alignment of decision logic between AI models and human cognition is crucial for alleviating common concerns about AI models. This alignment is achieved through communication, allowing people to naturally trust each other. ### Research Objectives - **Evaluate the Correctness of Decision Logic**: Go beyond the long-tail evaluation of generated outputs and focus on the correctness of the detailed decision logic used by LLMs behind the generated outputs. - **Quantify the Degree of Alignment**: Design a series of metrics to assess the alignment between the decision logic of LLMs and human cognition. ### Methods and Contributions - **Interactive Interpretation**: Utilize interactive interpretation methods to evaluate the correctness of the decision logic encoded by LLMs. Interactive interpretation can faithfully represent the fundamental reasoning patterns in DNNs. - **New Metrics**: Design new metrics to quantify reliable and unreliable interactive effects, thereby assessing the alignment of LLM logic with human cognition. - **Experimental Validation**: Conduct experiments on English and Chinese legal LLMs, showing that even though these models exhibit high accuracy in judgment prediction, they still use a large number of incorrect interactions for reasoning. ### Experimental Results - **Reliability Ratio**: Propose the reliable interaction effect ratio (sreliable) to measure the proportion of interaction effects aligned with human cognition. - **Distribution of Different Orders of Interaction**: Analyze the distribution of different orders of interaction to evaluate the generalization ability of the decision logic used by LLMs. - **Potential Representation Defects**: Identify potential representation defects in legal LLMs behind seemingly correct language generation results, including judgments influenced by unreliable sentiment words, incorrect entity matching, and occupational discrimination. ### Conclusion This paper reveals that even if the generated outputs are correct, LLMs may have significant internal reasoning logic issues by quantifying the alignment between the decision logic of LLMs and human cognition in the legal domain. These findings help improve the credibility and safety of LLMs in high-risk tasks.

Alignment Between the Decision-Making Logic of LLMs and Human Cognition: A Case Study on Legal LLMs

Aligning with Logic: Measuring, Evaluating and Improving Logical Consistency in Large Language Models

Systematic Evaluation of LLM-as-a-Judge in LLM Alignment Tasks: Explainable Metrics and Diverse Prompt Templates

Exploring the psychology of LLMs' Moral and Legal Reasoning

Can LLMs Reason with Rules? Logic Scaffolding for Stress-Testing and Improving LLMs

Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond

A Comprehensive Evaluation of Large Language Models on Legal Judgment Prediction

Enhancing Logical Reasoning in Large Language Models to Facilitate Legal Applications

LLMs for Relational Reasoning: How Far are We?

Argumentative Large Language Models for Explainable and Contestable Decision-Making

From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Tasks

Are LLMs Rigorous Logical Reasoner? Empowering Natural Language Proof Generation with Contrastive Stepwise Decoding

Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges

Beyond LLMs: Advancing the Landscape of Complex Reasoning

A Survey on Human-Centric LLMs

Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment

Decision-Making Behavior Evaluation Framework for LLMs under Uncertain Context

Human-Centered Design Recommendations for LLM-as-a-Judge

Can Large Language Models Grasp Legal Theories? Enhance Legal Reasoning with Insights from Multi-Agent Collaboration

Should We Fear Large Language Models? A Structural Analysis of the Human Reasoning System for Elucidating LLM Capabilities and Risks Through the Lens of Heidegger's Philosophy