Abstract:Crowdsourcing has gradually become an effective e-government process to gather citizen complaints over the implementation of various public services. In practice, the collected complaints form a massive dataset, making it difficult for government officers to analyze the big data effectively. It is consequently vital to use data mining algorithms to classify the citizen complaint data for efficient follow-up actions. However, different classification algorithms produce varied classification accuracies. Thus, this study aimed to compare the accuracy of several classification algorithms on crowdsourced citizen complaint data. Taking the case of the LAKSA app in Tangerang City, Indonesia, this study included k-Nearest Neighbors, Random Forest, Support Vector Machine, and AdaBoost for the accuracy assessment. The data were taken from crowdsourced citizen complaints submitted to the LAKSA app, including those aggregated from official social media channels, from May 2021 to April 2022. The results showed SVM with a linear kernel as the most accurate among the assessed algorithms (89.2%). In contrast, AdaBoost (base learner: Decision Trees) produced the lowest accuracy. Still, the accuracy levels of all algorithms varied in parallel to the amount of training data available for the actual classification categories. Overall, the assessments on all algorithms indicated that their accuracies were insignificantly different, with an overall variation of 4.3%. The AdaBoost-based classification, in particular, showed its large dependence on the choice of base learners. Looking at the method and results, this study contributes to e-government, data mining, and big data discourses. This research recommends that governments continuously conduct supervised training of classification algorithms over their crowdsourced citizen complaints to seek the highest accuracy possible, paving the way for smart and sustainable governance.

Optimization of Classification Algorithms Performance with k-Fold Cross Validation

Research of Machine Learning algorithms using K-fold cross validation

Don't Waste Your Time: Early Stopping Cross-Validation

Optimised one-class classification performance

Analisis Data Bank Direct Marketing dengan Perbandingan Klasifikasi Data Mining Berbasis Optimize Selection (Evolutionary)

Cross-Validation Approach to Evaluate Clustering Algorithms: An Experimental Study Using Multi-Label Datasets

Is K-fold cross validation the best model selection method for Machine Learning?

Prediksi Akurasi Perusahaan Saham Menggunakan SVM dan K-Fold Cross Validation

Comparison of the Effects of Cross-validation Methods on Determining Performances of Classifiers Used in Diagnosing Congestive Heart Failure

An Efficient Data Partitioning to Improve Classification Performance While Keeping Parameters Interpretable

Classifying Crowdsourced Citizen Complaints through Data Mining: Accuracy Testing of k-Nearest Neighbors, Random Forest, Support Vector Machine, and AdaBoost

CLASSIFICATION OF STUDENT GRADUATION STATUS USING XGBOOST ALGORITHM

The accuracy of machine learning models relies on hyperparameter tuning: student result classification using random forest, randomized search, grid search, bayesian, genetic, and optuna algorithms

Enhancing Heart Disease Prediction Accuracy through Machine Learning Techniques and Optimization

Enumerating the k-fold configurations in multi-class classification problems

Cross-validation in high-dimensional spaces: a lifeline for least-squares models and multi-class LDA

Sensitivity Analysis with Cross-Validation for Feature Selection and Manifold Learning

Perbandingan Akurasi, Recall, dan Presisi Klasifikasi pada Algoritma C4.5, Random Forest, SVM dan Naive Bayes

Basic Tenets of Classification Algorithms K-Nearest-Neighbor, Support Vector Machine, Random Forest and Neural Network: A Review

Performance Evaluation of Regression Models in Predicting the Cost of Medical Insurance