Abstract:There are over one million apps on Google Play Store and over half a million publishers. Having such a huge number of apps and developers can pose a challenge to app users and new publishers on the store. Discovering apps can be challenging if apps are not correctly published in the right category, and, in turn, reduce earnings for app developers. Additionally, with over 41 categories on Google Play Store, deciding on the right category to publish an app can be challenging for developers due to the number of categories they have to choose from. Machine Learning has been very useful, especially in classification problems such sentiment analysis, document classification and spam detection. These strategies can also be applied to app categorization on Google Play Store to suggest appropriate categories for app publishers using details from their application. In this project, we built two variations of the Naive Bayes classifier using open metadata from top developer apps on Google Play Store in other to classify new apps on the store. These classifiers are then evaluated using various evaluation methods and their results compared against each other. The results show that the Naive Bayes algorithm performs well for our classification problem and can potentially automate app categorization for Android app publishers on Google Play Store

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is: how to use the Naive Bayes classification method in machine learning to accurately classify applications in the Google Play store, so as to help developers choose the correct category to publish their applications. This can not only improve the user experience of finding applications, but also increase the download volume and revenue of developers' applications. Specifically, the paper solves the following problems: 1. **Application classification problem**: - There are more than 1 million applications and more than 500,000 publishers in the Google Play store. Due to the large number of applications, developers face challenges in choosing the correct category. - If the applications are not correctly classified, it may be difficult for users to find these applications, thereby reducing the download volume of the applications and the revenue of the developers. 2. **Automated classification suggestions**: - The paper proposes to use the Naive Bayes classifier to provide automated classification suggestions for new applications based on the application data provided by existing successful developers. - By extracting the metadata of the application (such as application name, content rating, whether it is free, whether there are in - app purchases, description, etc.), and using the TF - IDF (term frequency - inverse document frequency) statistical method, a classification model is constructed. 3. **Improving classification accuracy**: - The paper compares the performance of two Naive Bayes classifiers (Multinomial Naive Bayes and Bernoulli Naive Bayes), and verifies their effectiveness through multiple evaluation methods (such as k - fold cross - validation, confusion matrix, F1 - score, etc.). - The experimental results show that the Multinomial Naive Bayes classifier performs better when dealing with all categories, and the classification accuracy is further improved after merging game - type applications into one large category. ### Formula display The formulas involved in the paper are as follows: 1. **Bayes' theorem**: \[ P(A|B)=\frac{P(B|A)P(A)}{P(B)} \] where: - \(P(A|B)\) is the posterior probability, which represents the probability that event \(A\) occurs given that event \(B\) has occurred. - \(P(B|A)\) is the likelihood, which represents the probability that event \(B\) occurs given that event \(A\) has occurred. - \(P(A)\) is the prior probability, which represents the probability that event \(A\) occurs. - \(P(B)\) is the marginal probability, which represents the probability that event \(B\) occurs. 2. **Maximum a posteriori estimation (MAP)**: \[ c_{\text{MAP}}=\argmax_{c\in C}(P(c|d)) \] \[ c_{\text{MAP}}=\argmax_{c\in C}(P(c)\prod_{1\leq k\leq n_d}P(t_k|c)) \] 3. **Log - likelihood estimation**: \[ c_{\text{map}}=\argmax_{c\in C}(\log P(c)+\sum_{1\leq k\leq n_d}\log P(t_k|c)) \] 4. **Laplace smoothing**: \[ P(t|c)=\frac{T_{ct}+1}{\sum_{t'\in V}(T_{ct'}+1)} \] 5. **TF - IDF calculation**: \[ w_n = \text{TF}_n\times\log(\text{IDF}_n) \] where: - \(\text{TF}_n\) is the term frequency of the \(n\) - th word in document \(D\). - \(\text{IDF}_n\) is the inverse document frequency of the \(n\) - th word, which is expressed as:

Applying Naive Bayes Classification to Google Play Apps Categorization

Automated Android Application Permission Recommendation

Towards More Accurate Content Categorization of API Discussions

On the automatic classification of app reviews

Evaluating Usage of Images for App Classification

A Large-Scale Exploratory Study of Android Sports Apps in the Google Play Store

Methodology for Analyzing the Traditional Algorithms Performance of User Reviews Using Machine Learning Techniques

Explainable artificial intelligence approach towards classifying educational android app reviews using deep learning

Investigating Influence of Google-Play Application Titles on Success

Metanet: Interpretable Unknown Mobile Malware Identification with a Novel Meta-Features Mining Algorithm

Automatic Classification of Games using Support Vector Machine

Taming the Android AppStore: Lightweight Characterization of Android Applications

Perceiving University Student's Opinions from Google App Reviews

Towards Release Strategy Optimization for Apps in Google Play

Automatically Classifying Kano Model Factors in App Reviews

An exploratory and automated study of sarcasm detection and classification in app stores using fine-tuned deep learning classifiers

Detecting and Characterising Mobile App Metamorphosis in Google Play Store

A Study of Grayware on Google Play

Insights into mobile health application market via a content analysis of marketplace data with machine learning

Analisis Sentimen Ulasan Pengguna Game Pubg Di Google Play Store Menggunakan Algoritma Naïve Bayes

Single Stage Prediction with Embedded Topic Modeling of Online Reviews for Mobile App Management