Alignment Between the Decision-Making Logic of LLMs and Human Cognition: A Case Study on Legal LLMs

Lu Chen,Yuxuan Huang,Yixing Li,Yaohui Jin,Shuai Zhao,Zilong Zheng,Quanshi Zhang
2024-10-06
Abstract:This paper presents a method to evaluate the alignment between the decision-making logic of Large Language Models (LLMs) and human cognition in a case study on legal LLMs. Unlike traditional evaluations on language generation results, we propose to evaluate the correctness of the detailed decision-making logic of an LLM behind its seemingly correct outputs, which represents the core challenge for an LLM to earn human trust. To this end, we quantify the interactions encoded by the LLM as primitive decision-making logic, because recent theoretical achievements have proven several mathematical guarantees of the faithfulness of the interaction-based explanation. We design a set of metrics to evaluate the detailed decision-making logic of LLMs. Experiments show that even when the language generation results appear correct, a significant portion of the internal inference logic contains notable issues.
Artificial Intelligence,Computation and Language,Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to evaluate the alignment between the decision logic of large language models (LLMs) and human cognition in the legal domain. Specifically, the authors propose a method to quantify the correctness of the internal reasoning logic of LLMs, rather than merely assessing the correctness of their generated language outputs. This represents a core challenge for LLMs to gain human trust. ### Main Issues and Background 1. **Trust and Safety**: The credibility and safety of large language models in high-risk tasks are significant issues. Traditional evaluation methods mainly focus on the correctness of generated outputs but overlook the detailed decision logic behind them. 2. **Case Study in the Legal Domain**: The authors choose LLMs in the legal domain as a case study because, even if the generated outputs are correct, legal LLMs may make judgments based on significantly erroneous information. 3. **Importance of Alignment**: The alignment of decision logic between AI models and human cognition is crucial for alleviating common concerns about AI models. This alignment is achieved through communication, allowing people to naturally trust each other. ### Research Objectives - **Evaluate the Correctness of Decision Logic**: Go beyond the long-tail evaluation of generated outputs and focus on the correctness of the detailed decision logic used by LLMs behind the generated outputs. - **Quantify the Degree of Alignment**: Design a series of metrics to assess the alignment between the decision logic of LLMs and human cognition. ### Methods and Contributions - **Interactive Interpretation**: Utilize interactive interpretation methods to evaluate the correctness of the decision logic encoded by LLMs. Interactive interpretation can faithfully represent the fundamental reasoning patterns in DNNs. - **New Metrics**: Design new metrics to quantify reliable and unreliable interactive effects, thereby assessing the alignment of LLM logic with human cognition. - **Experimental Validation**: Conduct experiments on English and Chinese legal LLMs, showing that even though these models exhibit high accuracy in judgment prediction, they still use a large number of incorrect interactions for reasoning. ### Experimental Results - **Reliability Ratio**: Propose the reliable interaction effect ratio (sreliable) to measure the proportion of interaction effects aligned with human cognition. - **Distribution of Different Orders of Interaction**: Analyze the distribution of different orders of interaction to evaluate the generalization ability of the decision logic used by LLMs. - **Potential Representation Defects**: Identify potential representation defects in legal LLMs behind seemingly correct language generation results, including judgments influenced by unreliable sentiment words, incorrect entity matching, and occupational discrimination. ### Conclusion This paper reveals that even if the generated outputs are correct, LLMs may have significant internal reasoning logic issues by quantifying the alignment between the decision logic of LLMs and human cognition in the legal domain. These findings help improve the credibility and safety of LLMs in high-risk tasks.