Abstract:On social media platforms such as Twitter and Facebook, people express their views, arguments, and emotions of many events in daily life. Twitter is an international microblogging service featuring short messages called “tweets” from different languages. These texts often consist of noise in the form of incorrect grammar, abbreviations, freestyle, and typographical errors. Sentiment analysis (SA) aims to predict the actual emotions from the raw text expressed by the people through the field of natural language processing (NLP). The main aim of our work is to process the raw sentence from the Twitter dataset and find the actual polarity of the message. This paper proposes a text normalization with deep convolutional character level embedding (Conv-char-Emb) neural network model for SA of unstructured data. This model can tackle the problems: (1) processing the noisy sentence for sentiment detection (2) handling small memory space in word level embedded learning (3) accurate sentiment analysis of the unstructured data. The initial preprocessing stage for performing text normalization includes the following steps: tokenization, out of vocabulary (OOV) detection and its replacement, lemmatization and stemming. A character-based embedding in convolutional neural network (CNN) is an effective and efficient technique for SA that uses less learnable parameters in feature representation. Thus, the proposed method performs both the normalization and classification of sentiments for unstructured sentences. The experimental results are evaluated in the Twitter dataset by a different point polarity (positive, negative and neutral). As a result, our model performs well in normalization and sentiment analysis of the raw Twitter data enriched with hidden information.

Machine Normalization

Adapting Sequence to Sequence models for Text Normalization in Social Media

Automatic Standardization of Arabic Dialects for Machine Translation

Normalizing Text using Language Modelling based on Phonetics and String Similarity

Script Normalization for Unconventional Writing of Under-Resourced Languages in Bilingual Communities

Chinese-English Mixed Text Normalization

Machine learning based framework for fine-grained word segmentation and enhanced text normalization for low resourced language

Machine Translation of Low-Resource Spoken Dialects: Strategies for Normalizing Swiss German

Machine Translation for Accessible Multi-Language Text Analysis

Exploiting Dialect Identification in Automatic Dialectal Text Normalization

Text Normalization in Mandarin Text-to-speech System.

Character level embedding with deep convolutional neural network for text normalization of unstructured data for Twitter sentiment analysis

A Unified Tagging Approach to Text Normalization.

Building User-oriented Personalized Machine Translator Based on User-Generated Textual Content

A Three-Stage Text Normalization Strategy for Mandarin Text-to-Speech Systems

Text Normalization in Chinese Text-to-Speech System

Transferring Informal Text in Arabic as Low Resource Languages: State-of-the-Art and Future Research Directions

Content-Localization based Neural Machine Translation for Informal Dialectal Arabic: Spanish/French to Levantine/Gulf Arabic

Rule-Based Machine Translation from Tunisian Dialect to Modern Standard Arabic

Text normalization using memory augmented neural networks

Normalization of Chinese Chat Language