Abstract:Natural language processing (NLP) tasks tend to suffer from a paucity of suitably annotated training data, hence the recent success of transfer learning across a wide variety of them. The typical recipe involves: (i) training a deep, possibly bidirectional, neural network with an objective related to language modeling, for which training data is plentiful; and (ii) using the trained network to derive contextual representations that are far richer than standard linear word embeddings such as word2vec, and thus result in important gains. In this work, we wonder whether the opposite perspective is also true: can contextual representations trained for different NLP tasks improve language modeling itself? Since language models (LMs) are predominantly locally optimized, other NLP tasks may help them make better predictions based on the entire semantic fabric of a document. We test the performance of several types of pre-trained embeddings in neural LMs, and we investigate whether it is possible to make the LM more aware of global semantic information through embeddings pre-trained with a domain classification model. Initial experiments suggest that as long as the proper objective criterion is used during training, pre-trained embeddings are likely to be beneficial for neural language modeling.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: Can the contextual representations trained for different natural language processing (NLP) tasks be used to improve the language model (LM) itself? Specifically, the author explores transferring the knowledge of other NLP tasks into the language model through pre - trained embeddings to enhance its ability to perceive global semantic information, thereby improving the performance of the language model. Since language models are usually mainly optimized for local prediction, introducing other NLP tasks may help the model make better predictions based on the overall semantic structure of the document. The paper mentions that although language models can learn rich local context information when trained on a large amount of data, they may overlook global semantic information. For example, when predicting specific content words, if only relying on the local context, it may be impossible to accurately predict certain words (such as "hurricane"), because the appearance of these words requires considering the overall background of the text. Therefore, the author proposes a method of "reverse transfer learning", that is, transferring knowledge from other NLP tasks to the language model to make up for the deficiency of the language model in this regard. To verify this hypothesis, the author experimented with several types of pre - trained embeddings, including embeddings trained based on local context (such as word2vec) and embeddings aimed at capturing global semantic information (such as embeddings trained through domain classification models). The experimental results show that when the pre - trained task is closely related to the target task, the pre - trained embeddings can more effectively improve the performance of the language model. In particular, embeddings trained with a bidirectional language model can also significantly reduce perplexity on smaller data sets, indicating that even with a limited amount of data, appropriate pre - training can effectively improve the performance of the language model.

Reverse Transfer Learning: Can Word Embeddings Trained for Different NLP Tasks Improve Neural Language Models?

Cross-lingual Transferring of Pre-trained Contextualized Language Models

Reverse Modeling in Large Language Models

On the Transferability of Pre-trained Language Models: A Study from Artificial Datasets

Obtaining Better Word Representations via Language Transfer

Embedding Word Similarity with Neural Machine Translation

Effective Cross-lingual Transfer of Neural Machine Translation Models without Shared Vocabularies

Neural Task Representations as Weak Supervision for Model Agnostic Cross-Lingual Transfer

Cross-lingual Transfer of Sentiment Classifiers

What Matters for Neural Cross-Lingual Named Entity Recognition: An Empirical Analysis

Exploring transfer learning for Deep NLP systems on rarely annotated languages

Cross-lingual Transfer Learning for Pre-trained Contextualized Language Models

How transfer learning impacts linguistic knowledge in deep NLP models?

Dissecting contextual word embeddings: Architecture and representation

Advancing language models through domain knowledge integration: a comprehensive approach to training, evaluation, and optimization of social scientific neural word embeddings

Reverse training to nurse the reversal curse

MetaVL: Transferring In-Context Learning Ability From Language Models to Vision-Language Models

Exploiting Future Word Contexts in Neural Network Language Models for Speech Recognition.

Pre-Trained Multi-View Word Embedding Using Two-Side Neural Network

Exploring and Predicting Transferability across NLP Tasks

LMMS Reloaded: Transformer-based Sense Embeddings for Disambiguation and Beyond