Shannon Entropy is better Feature than Category and Sentiment in User Feedback Processing

Andres Rojas Paredes,Brenda Mareco
2024-09-18
Abstract:App reviews in mobile app stores contain useful information which is used to improve applications and promote software evolution. This information is processed by automatic tools which prioritize reviews. In order to carry out this prioritization, reviews are decomposed into features like category and sentiment. Then, a weighted function assigns a weight to each feature and a review ranking is calculated. Unfortunately, in order to extract category and sentiment from reviews, its is required at least a classifier trained in an annotated corpus. Therefore this task is computational demanding. Thus, in this work, we propose Shannon Entropy as a simple feature which can replace standard features. Our results show that a Shannon Entropy based ranking is better than a standard ranking according to the NDCG metric. This result is promising even if we require fairness by means of algorithmic bias. Finally, we highlight a computational limit which appears in the search of the best ranking.
Software Engineering
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is: when dealing with user feedback (especially application reviews), how to more effectively extract and utilize useful information to improve the application and enhance the process of software evolution. Specifically, the paper explores the following issues: 1. **Optimal Weight Combination**: - What is the optimal combination of feature weights so that, according to the NDCG (Normalized Discounted Cumulative Gain) metric, the generated review rankings are closest to the rankings manually labeled by experts? - Expressed by the formula: \[ R(c)=\sum_{i = 1}^{4}w_i\cdot f_i(c) \] where \(w_i\) represents the weight of the \(i\)-th feature, and \(f_i(c)\) represents the scoring factor of the \(i\)-th feature. 2. **Effectiveness of Shannon Entropy as a Feature**: - Can Shannon entropy replace traditional features (such as category and sentiment) for review ranking? The paper verifies the effectiveness of Shannon entropy as a feature through experiments and finds that it performs better than traditional features. - The specific formula is: \[ H(X)=-\sum_{i = 1}^{n}p(x_i)\log_2 p(x_i) \] where \(p(x_i)\) is the probability of the occurrence of the character \(x_i\). 3. **Impact of Calculation Precision on Performance**: - When increasing the precision of weight calculation (for example, from two decimal places to three decimal places), will the computational resources and time requirements exceed the practically feasible range? Research shows that using three - decimal - place precision will lead to a sharp increase in the number of combinations, resulting in a significant increase in calculation time and disk space. 4. **Algorithm Bias and Its Mitigation**: - How to detect and mitigate national biases generated by algorithms? The paper uses the AIF 360 tool to detect biases and applies the re - weighting algorithm for mitigation, but finds that this will reduce the NDCG value. ### Main Conclusions - **Advantages of Shannon Entropy as a Feature**: Shannon entropy can effectively replace traditional category and sentiment features, and is simple to calculate without the need for complex machine - learning models and annotated corpora. - **Computational Resource Limitations**: As the weight precision increases, the computational resource requirements grow exponentially, and it becomes infeasible when reaching three - decimal - place precision. - **Bias and Fairness**: Although Shannon entropy improves the ranking accuracy, there are still biases among different countries, and further research is needed on how to optimize the ranking while maintaining fairness. These research results provide new ideas for future user - feedback processing, especially in terms of feature selection and computational efficiency.