Abstract:Large language models (LLMs) have shown various ability on natural language processing, including problems about causality. It is not intuitive for LLMs to command causality, since pretrained models usually work on statistical associations, and do not focus on causes and effects in sentences. So that probing internal manipulation of causality is necessary for LLMs. This paper proposes a novel approach to probe causality manipulation hierarchically, by providing different shortcuts to models and observe behaviors. We exploit retrieval augmented generation (RAG) and in-context learning (ICL) for models on a designed causality classification task. We conduct experiments on mainstream LLMs, including GPT-4 and some smaller and domain-specific models. Our results suggest that LLMs can detect entities related to causality and recognize direct causal relationships. However, LLMs lack specialized cognition for causality, merely treating them as part of the global semantic of the sentence.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to explore the internal mechanisms of large - language models (LLMs) in handling causal relationships. Specifically, although large - language models perform excellently in natural - language - processing tasks, including some causal - reasoning tasks, how they intrinsically handle causal relationships remains unclear. Due to the complex structures of these models and the large number of parameters, the cost of directly reconstructing or in - depth studying their internal mechanisms is extremely high, and some advanced architectures such as Mixture - of - Experts (MoE) make detailed exploration even more difficult. In addition, the technical details of some existing models are not made public, which further increases the difficulty of research. To meet this challenge, this paper proposes an innovative method to explore the internal mechanisms of LLMs in handling causal relationships. By constructing a classification dataset to detect causal entities in sentences and their relationships, and by providing "shortcuts" hierarchically (such as Retrieval - Augmented Generation (RAG) and In - Context Learning (ICL)) to guide model behavior, and finally by observing the performance changes under different "shortcuts" to explore the internal way that LLMs handle causal relationships. The main contributions of the paper include: 1. Constructing a classification dataset specifically for exploring causal - relationship handling. 2. Proposing a method of providing "shortcuts" hierarchically to observe and analyze the performance of LLMs under different conditions. 3. The experimental results show that although LLMs can identify causal entities and understand causal relationships to a certain extent, they lack specific cognition of causal relationships and mainly rely on the global semantics of sentences to make judgments. Through these studies, the author hopes to reveal the deficiencies of LLMs in handling causal relationships and provide references for further optimizing model training.

Probing Causality Manipulation of Large Language Models

Causality for Large Language Models

Cause and Effect: Can Large Language Models Truly Understand Causality?

Causal Reasoning and Large Language Models: Opening a New Frontier for Causality

Latent Causal Probing: A Formal Perspective on Probing with Causal Models of Data

Evaluating Large Language Models for Causal Modeling

Can Large Language Models Learn Independent Causal Mechanisms?

From Pre-training Corpora to Large Language Models: What Factors Influence LLM Performance in Causal Discovery Tasks?

Large Language Model for Causal Decision Making

Is Knowledge All Large Language Models Needed for Causal Reasoning?

Large Language Models and Causal Inference in Collaboration: A Comprehensive Survey

Causal Inference with Large Language Model: A Survey

CELLO: Causal Evaluation of Large Vision-Language Models

From Query Tools to Causal Architects: Harnessing Large Language Models for Advanced Causal Discovery from Data

LLM4Causal: Democratized Causal Tools for Everyone via Large Language Model

Causal Parrots: Large Language Models May Talk Causality But Are Not Causal

Language Agents Meet Causality -- Bridging LLMs and Causal World Models

Counterfactual Causal Inference in Natural Language with Large Language Models

Causal Dataset Discovery with Large Language Models

Large Language Models for Constrained-Based Causal Discovery

Can Large Language Models Infer Causation from Correlation?