Establishing a soil carbon flux monitoring system based on support vector machine and XGBoost

Hanwei Ding
DOI: https://doi.org/10.1007/s00500-024-09641-y
IF: 3.732
2024-01-30
Soft Computing
Abstract:Soil carbon fluxes are pivotal indicators of climate impacts, yet field-level monitoring remains challenging. This study puts forth an innovative integrated framework coupling support vector machine (SVM) and XGBoost algorithms to enable automated, precise tracking of peat soil carbon dioxide emissions. The core methodology handles a multi-dimensional dataset encompassing 72-h flux measurements from 360 intact tropical peat cores under controlled moisture conditions spanning 30–85% water-filled pore space across intact, logged, and oil palm converted sites. Rigorous preprocessing via outlier elimination and missing value imputation coupled with a tenfold cross-validation approach lays the robust analytical foundation. SVM first applies nonlinear transformation through Gaussian radial basis functions to classify complex soil respiration patterns. An optimized hyperplane decision boundary discretizes the high-dimensional space to separate classes. XGBoost subsequently constructs an ensemble of weighted decision trees targeting residual errors to incrementally boost predictions over 500 iterations. The integrated framework combines SVM and XGBoost outputs using performance-based weighting. This allows efficiently mapping intricate moisture, temperature, oxygen availability, microbial activity, and land use effects on peat soil carbon dioxide production and emission dynamics. Integrated predictions leverage complementary strengths. Peaking at 94.4% accuracy, 92% precision, 91% recall and 0.3 RMSE, SVM with XGBoost decisively surpasses neural networks, LSTM, gradient boosting and regression trees, proving optimized encoding of intricate moisture, texture and land use effects on soil respiration. Clustered data representations confirm feasibility of mapping complex emission behaviors across intact and drained sites. Overall, the dual framework delivers a precise, automated system to unlock new frontiers in responsive soil carbon monitoring and modeling at scale. Next phases should focus on expanding multivariate input data and assessing generalizability across soil and vegetation types.
computer science, artificial intelligence, interdisciplinary applications
What problem does this paper attempt to address?