A Comparative Analysis of XGBoost and Neural Network Models for Predicting Some Tomato Fruit Quality Traits from Environmental and Meteorological Data

Oussama M’hamdi,Sándor Takács,Gábor Palotás,Riadh Ilahy,Lajos Helyes,Zoltán Pék,Oussama M'hamdi
DOI: https://doi.org/10.3390/plants13050746
2024-03-07
Plants
Abstract:The tomato as a raw material for processing is globally important and is pivotal in dietary and agronomic research due to its nutritional, economic, and health significance. This study explored the potential of machine learning (ML) for predicting tomato quality, utilizing data from 48 cultivars and 28 locations in Hungary over 5 seasons. It focused on °Brix, lycopene content, and colour (a/b ratio) using extreme gradient boosting (XGBoost) and artificial neural network (ANN) models. The results revealed that XGBoost consistently outperformed ANN, achieving high accuracy in predicting °Brix (R2 = 0.98, RMSE = 0.07) and lycopene content (R2 = 0.87, RMSE = 0.61), and excelling in colour prediction (a/b ratio) with a R2 of 0.93 and RMSE of 0.03. ANN lagged behind particularly in colour prediction, showing a negative R2 value of −0.35. Shapley additive explanation's (SHAP) summary plot analysis indicated that both models are effective in predicting °Brix and lycopene content in tomatoes, highlighting different aspects of the data. SHAP analysis highlighted the models' efficiency (especially in °Brix and lycopene predictions) and underscored the significant influence of cultivar choice and environmental factors like climate and soil. These findings emphasize the importance of selecting and fine-tuning the appropriate ML model for enhancing precision agriculture, underlining XGBoost's superiority in handling complex agronomic data for quality assessment.
plant sciences
What problem does this paper attempt to address?
This paper mainly discusses the problem of using machine learning (ML) methods to predict tomato fruit quality characteristics, especially predicting sugar content (Brix), lycopene content, and color (a / b ratio) from environmental and meteorological data. The study compares two models - Extreme Gradient Boosting (XGBoost) and Artificial Neural Network (ANN) - in predicting these indicators. The results show that XGBoost outperforms ANN on all three prediction indicators, especially in color prediction, with higher R² values and lower root mean square error (RMSE). SHAP analysis reveals the effectiveness of the model in predicting Brix and lycopene content, and emphasizes the importance of variety selection and environmental factors such as climate and soil. The paper emphasizes the importance of selecting and fine-tuning appropriate ML models for improving precision agriculture and points out the superiority of XGBoost in handling complex agronomic data.