Wine feature importance and quality prediction: A comparative study of machine learning algorithms with unbalanced data

Siphendulwe Zaza,Marcellin Atemkeng,Sisipho Hamlomo
DOI: https://doi.org/10.48550/arXiv.2310.01584
2023-10-03
Abstract:Classifying wine as "good" is a challenging task due to the absence of a clear criterion. Nevertheless, an accurate prediction of wine quality can be valuable in the certification phase. Previously, wine quality was evaluated solely by human experts, but with the advent of machine learning this evaluation process can now be automated, thereby reducing the time and effort required from experts. The feature selection process can be utilized to examine the impact of analytical tests on wine quality. If it is established that specific input variables have a significant effect on predicting wine quality, this information can be employed to enhance the production process. We studied the feature importance, which allowed us to explore various factors that affect the quality of the wine. The feature importance analysis suggests that alcohol significantly impacts wine quality. Furthermore, several machine learning models are compared, including Random Forest (RF), Support Vector Machine (SVM), Gradient Boosting (GB), K-Nearest Neighbors (KNN), and Decision Tree (DT). The analysis revealed that SVM excelled above all other models with a 96\% accuracy rate.
Applications
What problem does this paper attempt to address?