Abstract:Nigerian English adaptation, Pidgin, has evolved over the years through multi-language code switching, code mixing and linguistic adaptation. While Pidgin preserves many of the words in the normal English language corpus, both in spelling and pronunciation, the fundamental meaning of these words have changed significantly. For example,'ginger' is not a plant but an expression of motivation and 'tank' is not a container but an expression of gratitude. The implication is that the current approach of using direct English sentiment analysis of social media text from Nigeria is sub-optimal, as it will not be able to capture the semantic variation and contextual evolution in the contemporary meaning of these words. In practice, while many words in Nigerian Pidgin adaptation are the same as the standard English, the full English language based sentiment analysis models are not designed to capture the full intent of the Nigerian pidgin when used alone or code-mixed. By augmenting scarce human labelled code-changed text with ample synthetic code-reformatted text and meaning, we achieve significant improvements in sentiment scoring. Our research explores how to understand sentiment in an intrasentential code mixing and switching context where there has been significant word <a class="link-external link-http" href="http://localization.This" rel="external noopener nofollow">this http URL</a> work presents a 300 VADER lexicon compatible Nigerian Pidgin sentiment tokens and their scores and a 14,000 gold standard Nigerian Pidgin tweets and their sentiments labels.

Mind Your Inflections! Improving NLP for Non-Standard Englishes with Base-Inflection Encoding

It's Morphin' Time! Combating Linguistic Discrimination with Inflectional Perturbations

From Genesis to Creole Language

Morphological Inflection: A Reality Check

Linguistically inspired morphological inflection with a sequence to sequence model

Universal Dependencies Parsing For Colloquial Singaporean English

Morphological Inflection with Phonological Features

Adaptive BPE Tokenization for Enhanced Vocabulary Adaptation in Finetuning Pretrained Language Models

A Systematic Analysis of Vocabulary and BPE Settings for Optimal Fine-tuning of NMT: A Case Study of In-domain Translation

Modeling Orthographic Variation Improves NLP Performance for Nigerian Pidgin

Improving Korean NLP Tasks with Linguistically Informed Subword Tokenization and Sub-character Decomposition

Adapting Word Embeddings to New Languages with Morphological and Phonological Subword Representations

Modeling Target-Side Inflection in Neural Machine Translation

A Novel Cascade Instruction Tuning Method for Biomedical NER.

HinglishNLP: Fine-tuned Language Models for Hinglish Sentiment Detection

Bridging the Gap between Text, Audio, Image, and Any Sequence: A Novel Approach using Gloss-based Annotation

Semantic Tokenizer for Enhanced Natural Language Processing

Multi-VALUE: A Framework for Cross-Dialectal English NLP

Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation

Semantic Enrichment of Nigerian Pidgin English for Contextual Sentiment Classification

BiSECT: Learning to Split and Rephrase Sentences with Bitexts