Cotton leaf curl disease (CLCuD) prediction modeling in upland cotton under different ecological conditions using machine learning tools

Satish Kumar Sain,Debashis Paul,Pradeep Kumar,Ashok Kumar,Man Mohan,D. Monga,A.H. Prakash,Y.G. Prasad
DOI: https://doi.org/10.1016/j.ecoinf.2024.102648
IF: 5.1
2024-05-23
Ecological Informatics
Abstract:Cotton leaf curl disease (CLCuD) is a major threat to cotton production in Africa and South Asia. Due to the dearth of absolute CLCuD-resistant cultivars and effective Bemisia tabaci -vector management strategy, yield loss in cotton crops is witnessed regularly. To ensure the timely application of management practices there is a dire need for a reliable prediction model that can forecast the CLCuD with high speed and accuracy. To overcome this problem, we developed and compared the machine learning (ML) techniques- multiple linear regression (MLR), bootstrap forest (BSF), boosted tree (BST) and artificial neural network (ANN) to predict CLCuD under field conditions. We investigated disease and weather data collected from four locations (host spots) for a period of nine years (2011–12 to 2019–20). During each season/year, 10 ecological variables including CLCuD disease incidence, severity data from ~8000 plants on four cultivars (2000 each), and ecological factors at each location were recorded. Temperature data were transformed to growing degree days (GDD) and used for modeling as one of the most dependent factors. All data sets for training and validation sets were divided into the ratio of 75:25 for machine learning models. Results indicated that the BSF model was the best ML model with the highest R-squared value in terms of CLCuD prediction accuracy (R 2 training = 0.81 and R 2 validation = 0.64) for CLCuD prediction. ANN model with 14 hidden nodes achieved a slightly low R-squared value (R 2 training = 0.80 and R 2 validation = 0.79) having an architecture of (9:14:1) whereas, the BST model achieved the lowest R-squared value (R 2 training = 0.71 and R 2 validation = 0.59). The testing and validation of activation functions and various training and validation sets indicated that the BSF model is the best ML model. This can provide technical support for CLCuD prediction under field conditions through the designed graphical user interface and further advocate the timely application of management interventions to boost cotton productivity in the region.
ecology
What problem does this paper attempt to address?