FAC$^2$E: Better Understanding Large Language Model Capabilities by Dissociating Language and Cognition

Xiaoqiang Wang,Lingfei Wu,Tengfei Ma,Bang Liu

2024-10-07

Abstract:Large language models (LLMs) are primarily evaluated by overall performance on various text understanding and generation tasks. However, such a paradigm fails to comprehensively differentiate the fine-grained language and cognitive skills, rendering the lack of sufficient interpretation to LLMs' capabilities. In this paper, we present FAC$^2$E, a framework for Fine-grAined and Cognition-grounded LLMs' Capability Evaluation. Specifically, we formulate LLMs' evaluation in a multi-dimensional and explainable manner by dissociating the language-related capabilities and the cognition-related ones. Besides, through extracting the intermediate reasoning from LLMs, we further break down the process of applying a specific capability into three sub-steps: recalling relevant knowledge, utilizing knowledge, and solving problems. Finally, FAC$^2$E evaluates each sub-step of each fine-grained capability, providing a two-faceted diagnosis for LLMs. Utilizing FAC$^2$E, we identify a common shortfall in knowledge utilization among models and propose a straightforward, knowledge-enhanced method to mitigate this issue. Our results not only showcase promising performance enhancements but also highlight a direction for future LLM advancements.

Computation and Language

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve The paper aims to address the lack of fine-grained and cognitive-level assessments in evaluating the capabilities of large language models (LLMs). Current evaluations of LLMs primarily focus on overall performance in various text understanding and generation tasks. This evaluation method fails to comprehensively distinguish between language and cognitive skills, leading to insufficient explanations of LLMs' capabilities. Specifically, the paper proposes a framework named FAC2E for fine-grained and cognitively-based LLMs capability assessment. FAC2E separates language-related abilities from cognitive-related abilities, making the evaluation process multidimensional and interpretable. Additionally, by extracting the intermediate reasoning process of LLMs, FAC2E decomposes the application of specific abilities into 3 sub-steps: recalling relevant knowledge, utilizing knowledge, and problem-solving. Ultimately, FAC2E evaluates each sub-step of each fine-grained ability, providing a dual-aspect diagnosis of LLMs. The main contributions of the paper include: 1. **Fine-grained Assessment**: Providing more detailed evaluations by separating language and cognitive abilities. 2. **Intermediate Reasoning Extraction**: Further refining the evaluation steps by extracting the model's intermediate reasoning process. 3. **Performance Enhancement**: Identifying common shortcomings in knowledge utilization and proposing a knowledge enhancement method to mitigate this issue. Through these methods, FAC2E not only demonstrates the potential for performance improvement but also points out the direction for future development of LLMs.

FAC$^2$E: Better Understanding Large Language Model Capabilities by Dissociating Language and Cognition

Revealing the structure of language model capabilities

Large Language Models and Cognitive Science: A Comprehensive Review of Similarities, Differences, and Challenges

Law of the Weakest Link: Cross Capabilities of Large Language Models

Exploring the LLM Journey from Cognition to Expression with Linear Representations

On Evaluating LLMs' Capabilities as Functional Approximators: A Bayesian Perspective

E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning

More Than Catastrophic Forgetting: Integrating General Capabilities For Domain-Specific LLMs

Through the Lens of Core Competency: Survey on Evaluation of Large Language Models

A Principled Framework for Knowledge-enhanced Large Language Model

Multi-Model Consistency for LLMs’ Evaluation

Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond

DnA-Eval: Enhancing Large Language Model Evaluation through Decomposition and Aggregation

Can I understand what I create? Self-Knowledge Evaluation of Large Language Models

A Layered Architecture for Developing and Enhancing Capabilities in Large Language Model-based Software Systems

Large Language Models Are In-Context Semantic Reasoners Rather Than Symbolic Reasoners

Dissociating language and thought in large language models: a cognitive perspective

CLR-Fact: Evaluating the Complex Logical Reasoning Capability of Large Language Models over Factual Knowledge

FedEval-LLM: Federated Evaluation of Large Language Models on Downstream Tasks with Collective Wisdom

Enhancing Large Language Models in Coding Through Multi-Perspective Self-Consistency