Abstract:Recent developments in unsupervised representation learning have successfully established the concept of transfer learning in NLP. Mainly three forces are driving the improvements in this area of research: More elaborated architectures are making better use of contextual information. Instead of simply plugging in static pre-trained representations, these are learned based on surrounding context in end-to-end trainable models with more intelligently designed language modelling objectives. Along with this, larger corpora are used as resources for pre-training large language models in a self-supervised fashion which are afterwards fine-tuned on supervised tasks. Advances in parallel computing as well as in cloud computing, made it possible to train these models with growing capacities in the same or even in shorter time than previously established models. These three developments agglomerate in new state-of-the-art (SOTA) results being revealed in a higher and higher frequency. It is not always obvious where these improvements originate from, as it is not possible to completely disentangle the contributions of the three driving forces. We set ourselves to providing a clear and concise overview on several large pre-trained language models, which achieved SOTA results in the last two years, with respect to their use of new architectures and resources. We want to clarify for the reader where the differences between the models are and we furthermore attempt to gain some insight into the single contributions of lexical/computational improvements as well as of architectural changes. We explicitly do not intend to quantify these contributions, but rather see our work as an overview in order to identify potential starting points for benchmark comparisons. Furthermore, we tentatively want to point at potential possibilities for improvement in the field of open-sourcing and reproducible research.

Investigating Pre-trained Language Models on Cross-Domain Datasets, a Step Closer to General AI

Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains.

Opening the Black Box: Analyzing Attention Weights and Hidden States in Pre-trained Language Models for Non-language Tasks

Pre-Trained Language Models and Their Applications

Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing

Investigation on task effect analysis and optimization strategy of multimodal large model based on Transformers architecture for various languages

GPT-3 Models are Poor Few-Shot Learners in the Biomedical Domain

Pre-Trained Models: Past, Present and Future

Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

On the comparability of Pre-trained Language Models

Domain-Specific Pretraining of Language Models: A Comparative Study in the Medical Field

Toward the Adoption of Explainable Pre-Trained Large Language Models for Classifying Human-Written and AI-Generated Sentences

From English To Foreign Languages: Transferring Pre-trained Language Models

The Dark Side of the Language: Pre-trained Transformers in the DarkNet

Matching domain experts by training from scratch on domain knowledge

Generalizing Question Answering System with Pre-trained Language Model Fine-tuning

Go Beyond Plain Fine-tuning: Improving Pretrained Models for Social Commonsense

Pre-Training a Language Model Without Human Language

Improving Language Understanding by Generative Pre-Training

Exploring the Benefits of Domain-Pretraining of Generative Large Language Models for Chemistry