Abstract:On social media platforms such as Twitter and Facebook, people express their views, arguments, and emotions of many events in daily life. Twitter is an international microblogging service featuring short messages called “tweets” from different languages. These texts often consist of noise in the form of incorrect grammar, abbreviations, freestyle, and typographical errors. Sentiment analysis (SA) aims to predict the actual emotions from the raw text expressed by the people through the field of natural language processing (NLP). The main aim of our work is to process the raw sentence from the Twitter dataset and find the actual polarity of the message. This paper proposes a text normalization with deep convolutional character level embedding (Conv-char-Emb) neural network model for SA of unstructured data. This model can tackle the problems: (1) processing the noisy sentence for sentiment detection (2) handling small memory space in word level embedded learning (3) accurate sentiment analysis of the unstructured data. The initial preprocessing stage for performing text normalization includes the following steps: tokenization, out of vocabulary (OOV) detection and its replacement, lemmatization and stemming. A character-based embedding in convolutional neural network (CNN) is an effective and efficient technique for SA that uses less learnable parameters in feature representation. Thus, the proposed method performs both the normalization and classification of sentiments for unstructured sentences. The experimental results are evaluated in the Twitter dataset by a different point polarity (positive, negative and neutral). As a result, our model performs well in normalization and sentiment analysis of the raw Twitter data enriched with hidden information.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to deal with noisy texts in Twitter data and conduct accurate sentiment analysis. Specifically, the author proposes a deep convolutional neural network model based on character - level embedding (Conv - char - Emb), aiming to solve the following three problems: 1. **Dealing with noisy sentences for sentiment detection**: Texts on Twitter usually contain noises such as grammar mistakes, abbreviations, free - style writing and spelling errors, which will affect the accuracy of sentiment analysis. Therefore, an effective method is needed to deal with these noisy sentences. 2. **Dealing with the small memory space problem in word - level embedding learning**: Traditional word - level embedding methods require a large vocabulary and memory space, which is especially obvious when dealing with multi - language texts. Character - level embedding can reduce the required memory space and improve the efficiency of the model. 3. **Conducting accurate sentiment analysis on unstructured data**: Unstructured data (such as texts on social media) are usually difficult to process because they lack a unified format and structure. Through character - level embedding and deep convolutional neural network, features can be more effectively extracted and sentiment classification can be carried out. To achieve these goals, the paper proposes a method that includes the following steps: - **Pre - processing stage**: - **Tokenization**: Divide the input text into words or phrases. - **Out - of - vocabulary (OOV) detection and replacement**: Use multiple dictionaries (such as Microsoft Dictionary, SMS Dictionary and Soundex Dictionary) to correct spelling mistakes and non - standard words. - **Lemmatization**: Restore different forms of words to their basic forms. - **Stemming**: Further restore words to their root forms. - **Character - level embedding and deep convolutional neural network (CNN)**: - Use character - level embedding technology to convert texts into vector representations, and then carry out feature extraction and sentiment classification through deep convolutional neural network. Through the above methods, the paper aims to improve the accuracy and efficiency of extracting sentiment from noisy texts, especially when dealing with multi - language and unstructured data. Experimental results show that this model performs well on Twitter data sets and can effectively carry out text normalization and sentiment analysis.

Character level embedding with deep convolutional neural network for text normalization of unstructured data for Twitter sentiment analysis

Transformer-based deep learning models for the sentiment analysis of social media data

Multi-layered perceptron based deep learning model for emotion extraction on monolingual text using intelligence feature engineering and filtering techniques

Social Media Opinion Summarization Using Emotion Cognition and Convolutional Neural Networks

Deep Sentiment Analysis Using CNN-LSTM Architecture of English and Roman Urdu Text Shared in Social Media

Hybrid deep learning approach for sentiment analysis using text and emojis

Hybrid Deep Learning Approach for Sentiment Analysis on Twitter Data

A supervised deep learning-based sentiment analysis by the implementation of Word2Vec and GloVe Embedding techniques

An Intelligent High Performance Automatic Sentiment Analysis Model Creation using Deep Convolution Neural Network

Exploring Deep Neural Networks and Transfer Learning for Analyzing Emotions in Tweets

Attention-Based CNN and Bi-LSTM Model Based on TF-IDF and GloVe Word Embedding for Sentiment Analysis

Leveraging distant supervision and deep learning for twitter sentiment and emotion classification

A Deep Neural Architecture for Sentence-level Sentiment Classification in Twitter Social Networking

High accuracy offering attention mechanisms based deep learning approach using CNN/bi-LSTM for sentiment analysis

Tweets Sentiment Analysis via Word Embeddings and Machine Learning Techniques

Sentiment Analysis on Social Media Content

Deep Learning Paradigm with Transformed Monolingual Word Embeddings for Multilingual Sentiment Analysis

ConvLSTMConv network: a deep learning approach for sentiment analysis in cloud computing

Political Sentiment Analysis of Persian Tweets Using CNN-LSTM Model

Text-based Sentiment Analysis and Music Emotion Recognition

Implementing Sentiment Analysis on Real-Time Twitter Data