Ensemble Feature Selection Framework for Paddy Yield Prediction in Cauvery Basin using Machine Learning Classifiers

P Sathya,P Gnanasekaran
DOI: https://doi.org/10.1080/23311916.2023.2250061
2023-08-29
Cogent Engineering
Abstract:Machine learning technique involves a significant amount of time and the model performance reduces in multi-dimensional datasets because of redundant features. Feature selection is a significant step in machine learning and involves the selection of a subset of relevant data feature from larger data feature, which further enhances model performance by simplification. Crop yield prediction is a significant application of machine learning, and feature selection plays a crucial role in it. To predict paddy yield accurately, weather, soil, and crop attributes are essential factors. Therefore, feature selection techniques are employed to identify relevant and non-redundant attributes from a larger dataset, which simplifies the prediction model. In this study, an ensemble feature selection method is proposed that selects an optimized subset of attributes by combining various attribute subsets based on mutual information between attributes and between attributes and the class. The proposed ensemble approach is validated using five classification techniques, including K-nearest neighbor, Random Forest, Support Vector Machine, Naive Bayes, and Bagging. Several evaluation metrics such as Accuracy, Error rate, Kappa, Precision, Recall, Specificity, and F1 score are used to assess the performance of the ensemble approach and compare it with other feature selection techniques. The experimental results indicate that the proposed ensemble approach with the Random Forest classifier outperforms other classifiers, with Accuracy and Error rate values of 0.9491 and 0.0509, respectively.
What problem does this paper attempt to address?