Abstract:App stores enable users to provide insightful feedback on apps, which developers can use for future software application enhancement and evolution. However, finding user reviews that are valuable and relevant for quality improvement and app enhancement is challenging because of increasing end-user feedback. Also, to date, according to our knowledge, the existing sentiment analysis approaches lack in considering sarcasm and its types when identifying sentiments of end-user reviews for requirements decision-making. Moreover, no work has been reported on detecting sarcasm by analyzing app reviews. This paper proposes an automated approach by detecting sarcasm and its types in end-user reviews and identifying valuable requirements-related information using natural language processing (NLP) and deep learning (DL) algorithms to help software engineers better understand end-user sentiments. For this purpose, we crawled 55,000 end-user comments on seven software apps in the Play Store. Then, a novel sarcasm coding guideline is developed by critically analyzing end-user reviews and recovering frequently used sarcastic types such as Irony, Humor, Flattery, Self-Deprecation, and Passive Aggression. Next, using coding guidelines and the content analysis approach, we annotated the 10,000 user comments and made them parsable for the state-of-the-art DL algorithms. We conducted a survey at two different universities in Pakistan to identify participants' accuracy in manually identifying sarcasm in the end-user reviews. We developed a ground truth to compare the results of DL algorithms. We then applied various fine-tuned DL classifiers to first detect sarcasm in the end-user feedback and then further classified the sarcastic reviews into more fine-grained sarcastic types. For this, end-user comments are first pre-processed and balanced with the instances in the dataset. Then, feature engineering is applied to fine-tune the DL classifiers. We obtain an average accuracy of 97%, 96%, 96%, 96%, 96%, 86%, and 90% with binary classification and 90%, 91%, 92%, 91%, 91%, 75%, and 89% with CNN, LSTM, BiLSTM, GRU, BiGRU, RNN, and BiRNN classifiers, respectively. Such information would help improve the performance of sentiment analysis approaches to understand better the associated sentiments with the identified new features or issues.

Sarcasm identification in textual data: systematic review, research challenges and open directions

Sarcasm Detection: A Comparative Study

Automatic Sarcasm Detection: A Survey

Identification of nonliteral language in social media: A case study on sarcasm

A survey of automatic sarcasm detection: Fundamental theories, formulation, datasets, detection methods, and opportunities

Computational Sarcasm Analysis on Social Media: A Systematic Review

Was that Sarcasm?: A Literature Survey on Sarcasm Detection

Sarcasm Detection on Text for Political Domain— An Explainable Approach

A Survey on Automated Sarcasm Detection on Twitter

N-Gram Based Sarcasm Detection for News and Social Media Text Using Hybrid Deep Learning Models

Sarcasm detection in online comments using machine learning

Effectiveness of data-driven induction of semantic spaces and traditional classifiers for sarcasm detection

Multi-Rule Based Ensemble Feature Selection Model for Sarcasm Type Detection in Twitter

An Evaluation of State-of-the-Art Large Language Models for Sarcasm Detection

Sarcasm detection using optimized bi-directional long short-term memory

A contextual-based approach for sarcasm detection

An exploratory and automated study of sarcasm detection and classification in app stores using fine-tuned deep learning classifiers

An Effective Sarcasm Detection Approach Based on Sentimental Context and Individual Expression Habits

A Survey of Multimodal Sarcasm Detection

Sarcasm Detection over Social Media Platforms Using Hybrid Ensemble Model with Fuzzy Logic

Sarcasm detection using news headlines dataset