Abstract:A central quest of probing is to uncover how pre-trained models encode a linguistic property within their representations. An encoding, however, might be spurious-i.e., the model might not rely on it when making predictions. In this paper, we try to find encodings that the model actually uses, introducing a usage-based probing setup. We first choose a behavioral task which cannot be solved without using the linguistic property. Then, we attempt to remove the property by intervening on the model's representations. We contend that, if an encoding is used by the model, its removal should harm the performance on the chosen behavioral task. As a case study, we focus on how BERT encodes grammatical number, and on how it uses this encoding to solve the number agreement task. Experimentally, we find that BERT relies on a linear encoding of grammatical number to produce the correct behavioral output. We also find that BERT uses a separate encoding of grammatical number for nouns and verbs. Finally, we identify in which layers information about grammatical number is transferred from a noun to its head verb.

What problem does this paper attempt to address?

This paper mainly explores how to investigate how pre-trained language models, such as BERT, internally represent and utilize linguistic features, particularly grammatical number. Traditional investigation methods may only reveal whether the model encodes a certain feature, but cannot determine whether the model actually relies on this encoding to make predictions. The researchers propose a usage-based probing setup, where they first select a behavioral task that requires understanding of grammatical number, and then attempt to eliminate this feature by intervening in the model's representation. If the model indeed utilizes a certain encoding, removing it should decrease performance on the selected behavioral task. As a case study, they focus on how BERT encodes and utilizes grammatical number to solve the task of number agreement. The experimental results show that BERT relies on linear encoding to handle grammatical number, and there are different encoding methods for nouns and verbs. Additionally, they discovered how information is transmitted from nouns to the layers between their subject-verb agreement. These findings contribute to understanding how BERT utilizes linguistic information when solving natural language processing tasks. In conclusion, the paper aims to address the problem of determining which linguistic feature encodings pre-trained models actually use, and understanding how these encodings function within the model.

Probing for the Usage of Grammatical Number

Adjective Scale Probe: Can Language Models Encode Formal Semantics Information?

Probing the Probing Paradigm: Does Probing Accuracy Entail Task Relevance?

Probing for Multilingual Numerical Understanding in Transformer-Based Language Models

Amnesic Probing: Behavioral Explanation with Amnesic Counterfactuals

Do Syntactic Probes Probe Syntax? Experiments with Jabberwocky Probing

Probing via Prompting

Topic Aware Probing: From Sentence Length Prediction to Idiom Identification how reliant are Neural Language Models on Topic?

A Matter of Framing: The Impact of Linguistic Formalism on Probing Results

Do NLP Models Know Numbers? Probing Numeracy in Embeddings.

Can We Use Probing to Better Understand Fine-tuning and Knowledge Distillation of the BERT NLU?

Probing Linguistic Information For Logical Inference In Pre-trained Language Models

Bird's Eye: Probing for Linguistic Graph Structures with a Simple Information-Theoretic Approach

Probing Pretrained Language Models for Lexical Semantics

How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation

Decoding Probing: Revealing Internal Linguistic Structures in Neural Language Models using Minimal Pairs

A Latent-Variable Model for Intrinsic Probing

Pareto Probing: Trading Off Accuracy for Complexity

Does My Representation Capture X? Probe-Ably

Probing What Different NLP Tasks Teach Machines about Function Word Comprehension

Probing for targeted syntactic knowledge through grammatical error detection