Sarcasm identification in textual data: systematic review, research challenges and open directions

Christopher Ifeanyi Eke,Azah Anir Norman,Liyana Shuib,Henry Friday Nweke
DOI: https://doi.org/10.1007/s10462-019-09791-8
IF: 9.588
2019-11-30
Artificial Intelligence Review
Abstract:Sarcasm is a form of sentiment whereby people express the implicit information, usually the opposite of the message content in order to hurt someone emotionally or criticise something in a humorous way. Sarcasm identification in textual data, being one of the hardest challenges in natural language processing (NLP), has recently become an interesting research area due to its importance in improving the sentiment analysis of social media data. A few studies have carried out a comprehensive literature review on sarcasm identification in the existing primary study within the last 11 years. Thus, this study carried out a review on the classification techniques for sarcasm identification under the aspects of datasets, pre-processing, feature engineering, classification algorithms, and performance metrics. The study has considered the published article from the period of 2008 to 2019. Forty (40) academic literature were selected from the 7 standard academic databases in order to carry out the review and realize the objectives. The study revealed that most researchers created their own datasets since there is no standard available datasets in the domain of sarcasm identification. Context and content-based linguistic features were used in most of the studies. This review shows that n-gram and parts of speech tagging techniques were the most commonly used feature extraction techniques. However, binary representation and term frequency were utilized for feature representation whereas Chi squared and information gain were used for the feature selection scheme. Moreover, classification algorithm such as support vector machine, Naïve Bayes, random forest, maximum entropy, and decision tree algorithm were mostly applied using accuracy, precision, recall and F-measure for performance measures. Finally, research challenges and future direction are summarized in this review. This review reveals the impact of sarcasm identification in building effective product reviews and would serve as handle resources for researchers and practitioners in sarcasm identification and text classification in general.
computer science, artificial intelligence
What problem does this paper attempt to address?