Abstract:Pre-trained contextualized language models such as BERT have shown great effectiveness in a wide range of downstream Natural Language Processing (NLP) tasks. However, the effective representations offered by the models target at each token inside a sequence rather than each sequence and the fine-tuning step involves the input of both sequences at one time, leading to unsatisfying representations of various sequences with different granularities. Especially, as sentence-level representations taken as the full training context in these models, there comes inferior performance on lower-level linguistic units (phrases and words). In this work, we present BURT (BERT inspired Universal Representation from Twin Structure) that is capable of generating universal, fixed-size representations for input sequences of any granularity, i.e., words, phrases, and sentences, using a large scale of natural language inference and paraphrase data with multiple training objectives. Our proposed BURT adopts the Siamese network, learning sentence-level representations from natural language inference dataset and word/phrase-level representations from paraphrasing dataset, respectively. We evaluate BURT across different granularities of text similarity tasks, including STS tasks, SemEval2013 Task 5(a) and some commonly used word similarity tasks, where BURT substantially outperforms other representation models on sentence-level datasets and achieves significant improvements in word/phrase-level representation.

Learning Better Universal Representations from Pre-trained Contextualized Language Models.

BURT: BERT-inspired Universal Representation from Twin Structure

BURT: BERT-inspired Universal Representation from Learning Meaningful Segment

Learning Universal Representations from Word to Sentence

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Pre-training Universal Language Representation

Unified BERT for Few-shot Natural Language Understanding

Semantics-aware BERT for Language Understanding.

Refined SBERT: Representing sentence BERT in manifold space

SBERT-WK: A Sentence Embedding Method by Dissecting BERT-based Word Models

Simple Flow-Based Contrastive Learning for BERT Sentence Representations

VL-BERT: Pre-training of Generic Visual-Linguistic Representations

StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding

On Learning Universal Representations Across Languages.

Universal Text Representation from BERT: An Empirical Study

Lattice-BERT: Leveraging Multi-Granularity Representations in Chinese Pre-trained Language Models

Coreferential Reasoning Learning for Language Representation

CoT-BERT: Enhancing Unsupervised Sentence Representation through Chain-of-Thought

Contrastive Learning Models for Sentence Representations

On the Sentence Embeddings from Pre-trained Language Models

CharBERT: Character-aware Pre-trained Language Model