Abstract:Introduction This study presents COVID-Twitter-BERT (CT-BERT), a transformer-based model that is pre-trained on a large corpus of COVID-19 related Twitter messages. CT-BERT is specifically designed to be used on COVID-19 content, particularly from social media, and can be utilized for various natural language processing tasks such as classification, question-answering, and chatbots. This paper aims to evaluate the performance of CT-BERT on different classification datasets and compare it with BERT-LARGE, its base model. Methods The study utilizes CT-BERT, which is pre-trained on a large corpus of COVID-19 related Twitter messages. The authors evaluated the performance of CT-BERT on five different classification datasets, including one in the target domain. The model's performance is compared to its base model, BERT-LARGE, to measure the marginal improvement. The authors also provide detailed information on the training process and the technical specifications of the model. Results The results indicate that CT-BERT outperforms BERT-LARGE with a marginal improvement of 10-30% on all five classification datasets. The largest improvements are observed in the target domain. The authors provide detailed performance metrics and discuss the significance of these results. Discussion The study demonstrates the potential of pre-trained transformer models, such as CT-BERT, for COVID-19 related natural language processing tasks. The results indicate that CT-BERT can improve the classification performance on COVID-19 related content, especially on social media. These findings have important implications for various applications, such as monitoring public sentiment and developing chatbots to provide COVID-19 related information. The study also highlights the importance of using domain-specific pre-trained models for specific natural language processing tasks. Overall, this work provides a valuable contribution to the development of COVID-19 related NLP models.

A Million Tweets Are Worth a Few Points: Tuning Transformers for Customer Service Tasks

Improving Sentiment Analysis over non-English Tweets using Multilingual Transformers and Automatic Translation for Data-Augmentation

The Dark Side of the Language: Pre-trained Transformers in the DarkNet

Understanding Transformers for Bot Detection in Twitter

Investigation on task effect analysis and optimization strategy of multimodal large model based on Transformers architecture for various languages

Empirical Evaluation of Pre-trained Transformers for Human-Level NLP: The Role of Sample Size and Dimensionality

On the Limitations of Sociodemographic Adaptation with Transformers

Transformer-based Multi-task Learning for Disaster Tweet Categorisation

Making the Most of your Model: Methods for Finetuning and Applying Pretrained Transformers

CrisisTransformers: Pre-trained language models and sentence encoders for crisis-related social media texts

Performance Evaluations of Large Language Models for Customer Service

Practical Text Classification With Large Pre-Trained Language Models

Processing Natural Language on Embedded Devices: How Well Do Transformer Models Perform?

Database Tuning using Natural Language Processing

Analyzing COVID-19 Tweets with Transformer-based Language Models

Benchmarking Transformers-based models on French Spoken Language Understanding tasks

Go Beyond Plain Fine-tuning: Improving Pretrained Models for Social Commonsense

Uncovering suggestions in MOOC discussion forums: a transformer-based approach

COVID-Twitter-BERT: A natural language processing model to analyse COVID-19 content on Twitter

Pretraining on the Test Set Is All You Need

S4-Tuning: A Simple Cross-lingual Sub-network Tuning Method-Tuning: A Simple Cross-lingual Sub-network Tuning Method