Tracking linguistic information in transformer-based sentence embeddings through targeted sparsification

Vivi Nastase,Paola Merlo

2024-07-25

Abstract:Analyses of transformer-based models have shown that they encode a variety of linguistic information from their textual input. While these analyses have shed a light on the relation between linguistic information on one side, and internal architecture and parameters on the other, a question remains unanswered: how is this linguistic information reflected in sentence embeddings? Using datasets consisting of sentences with known structure, we test to what degree information about chunks (in particular noun, verb or prepositional phrases), such as grammatical number, or semantic role, can be localized in sentence embeddings. Our results show that such information is not distributed over the entire sentence embedding, but rather it is encoded in specific regions. Understanding how the information from an input text is compressed into sentence embeddings helps understand current transformer models and help build future explainable neural models.

Computation and Language

What problem does this paper attempt to address?

The paper attempts to address the following issues: 1. **Information Localization in Sentence Embeddings**: Researchers aim to explore how syntactic information (such as noun phrases, verb phrases, or prepositional phrases) is encoded within sentence embeddings. Specifically, they want to determine whether this information is localized in specific regions of the embedding vector rather than being uniformly distributed across the entire vector. 2. **Understanding Information Compression**: By understanding how input text information is compressed into sentence embeddings, it helps to better comprehend current transformer models and lays the groundwork for constructing interpretable neural network models in the future. To achieve the above goals, the researchers designed a series of experiments to test their hypotheses, including: - Sparsifying the system for specific tasks to check whether information about different components of a sentence (e.g., noun phrases, verb phrases, etc.) can be found in different parts of the sentence embeddings. - Using datasets with known structures to test the distribution of this information within the sentence embeddings. - Finally, combining the sparsified encoder-decoder system with another layer to solve language tasks that depend on sentence components and their attributes.

Tracking linguistic information in transformer-based sentence embeddings through targeted sparsification

Are there identifiable structural parts in the sentence embedding whole?

Disentangling continuous and discrete linguistic signals in transformer-based sentence embeddings

Exploring Italian sentence embeddings properties through multi-tasking

SLViT: Scale-Wise Language-Guided Vision Transformer for Referring Image Segmentation.

Investigating semantic subspaces of Transformer sentence embeddings through linear structural probing

Grammatical information in BERT sentence embeddings as two-dimensional arrays

Exploring syntactic information in sentence embeddings through multilingual subject-verb agreement

Subspace Chronicles: How Linguistic Information Emerges, Shifts and Interacts during Language Model Training

A Bilingual Generative Transformer for Semantic Sentence Embedding

Representations as Language: An Information-Theoretic Framework for Interpretability

Graph-Induced Syntactic-Semantic Spaces in Transformer-Based Variational AutoEncoders

Revisiting Language Encoding in Learning Multilingual Representations

Evaluation of sentence embeddings in downstream and linguistic probing tasks

Orthotopic liver transplantation: the first 60 patients.

Tracking Universal Features Through Fine-Tuning and Model Merging

Natural Language Multitasking: Analyzing and Improving Syntactic Saliency of Hidden Representations

Disentangling semantics in language through VAEs and a certain architectural choice

Analyzing Transformers in Embedding Space

Investigating Language Universal and Specific Properties in Word Embeddings

Talking Heads: Understanding Inter-layer Communication in Transformer Language Models