Abstract:In 2019 there was an outbreak of coronavirus pandemic also known as COVID-19. Many scientists believe that the pandemic originated from Wuhan, China, before spreading to other parts of the globe. To reduce the spread of the disease, decision makers encouraged measures such as hand washing, face masking, and social distancing. In early 2021, some countries including the United States began administering COVID-19 vaccines. Vaccination brought a relief to the public; it also generated a lot of debates from anti-vaccine and pro-vaccine groups. The controversy and debate surrounding COVID-19 vaccine influenced the decision of several people in either to accept or reject vaccination. Because of data limitations, social media data, collected through live streaming public tweets using an Application Programming Interface (API) search, is considered a viable and reliable resource to study the opinion of the public on Covid-19 vaccine hesitancy. Thus, this study examines 3 sentiment computation methods (Azure Machine Learning, VADER, and TextBlob) to analyze COVID-19 vaccine hesitancy. Five learning algorithms (Random Forest, Logistics Regression, Decision Tree, LinearSVC, and Naïve Bayes) with different combination of three vectorization methods (Doc2Vec, CountVectorizer, and TF-IDF) were deployed. Vocabulary normalization was threefold; potter stemming, lemmatization, and potter stemming with lemmatization. For each vocabulary normalization strategy, we designed, developed, and evaluated 42 models. The study shows that Covid-19 vaccine hesitancy slowly decreases over time; suggesting that the public gradually feels warm and optimistic about COVID-19 vaccination. Moreover, combining potter stemming and lemmatization increased model performances. Finally, the result of our experiment shows that TextBlob + TF-IDF + LinearSVC has the best performance in classifying public sentiment into positive, neutral, or negative with an accuracy, precision, recall and F1 score of 0.96752, 0.96921, 0.92807 and 0.94702 respectively. It means that the best performance was achieved when using TextBlob sentiment score, with TF-IDF vectorization and LinearSVC classification model. We also found out that combining two vectorizations (CountVectorizer and TF-IDF) decreases model accuracy.

Multilingual Sentiment Analysis on Short Text Document Using Semi-Supervised Machine Learning

Machine Learning Techniques for Sentiment Analysis of COVID-19-Related Twitter Data

Multilingual text categorization and sentiment analysis: a comparative analysis of the utilization of multilingual approaches for classifying twitter data

Sentiment Analysis on Social Media Against Public Policy Using Multinomial Naive Bayes

Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis

Sentiment Analysis of Short Texts Using SVMs and VSMs-Based Multiclass Semantic Classification

A Survey on Sentiment Analysis and Opinion Mining in Greek Social Media

Naïve-Bayes family for sentiment analysis during COVID-19 pandemic and classification tweets

Empirical Analysis of Supervised and Unsupervised Machine Learning Algorithms with Aspect-Based Sentiment Analysis

A PRELIMINARY STUDY OF SENTIMENT ANALYSIS ON COVID-19 NEWS: LESSON LEARNED FROM DATA ACQUISITION, PRE-PROCESSING, AND DESCRIPTIVE ANALYTICS

Sentiment Analysis on Social Media Content

Lexicon-Based Sentiment Analysis on Text Polarities with Evaluation of Classification Models

Covid-19 vaccine hesitancy: Text mining, sentiment analysis and machine learning on COVID-19 vaccination Twitter dataset

CovDLCNet: LSTM based deep learning network for multiclass sentiment analysis on COVID-19 public tweets

Sentiment Analysis Techniques and Application-Survey and Taxonomy

Sentiment Analysis of COVID-19 Tweets Using Deep Learning and Lexicon-Based Approaches

Optimizing Machine Learning-based Sentiment Analysis Accuracy in Bilingual Sentences via Preprocessing Techniques

Sentiment Analysis of Multilingual Tweets based on Natural Language Processing (NLP)

Deep Learning Model for COVID-19 Sentiment Analysis on Twitter

Exploring COVID-19 vaccine sentiment: a Twitter-based analysis of text processing and machine learning approaches