Learning Better Universal Representations from Pre-trained Contextualized Language Models.

Yian Li,Hai Zhao
2020-01-01
Abstract:Pre-trained contextualized language models such as BERT have shown greateffectiveness in a wide range of downstream Natural Language Processing (NLP)tasks. However, the effective representations offered by the models target ateach token inside a sequence rather than each sequence and the fine-tuning stepinvolves the input of both sequences at one time, leading to unsatisfyingrepresentations of various sequences with different granularities. Especially,as sentence-level representations taken as the full training context in thesemodels, there comes inferior performance on lower-level linguistic units(phrases and words). In this work, we present BURT (BERT inspired UniversalRepresentation from Twin Structure) that is capable of generating universal,fixed-size representations for input sequences of any granularity, i.e., words,phrases, and sentences, using a large scale of natural language inference andparaphrase data with multiple training objectives. Our proposed BURT adopts theSiamese network, learning sentence-level representations from natural languageinference dataset and word/phrase-level representations from paraphrasingdataset, respectively. We evaluate BURT across different granularities of textsimilarity tasks, including STS tasks, SemEval2013 Task 5(a) and some commonlyused word similarity tasks, where BURT substantially outperforms otherrepresentation models on sentence-level datasets and achieves significantimprovements in word/phrase-level representation.
What problem does this paper attempt to address?