Comparative Analysis of Two Machine Learning Algorithms in Predicting Site-Level Net Ecosystem Exchange in Major Biomes

Jianzhao Liu,Yunjiang Zuo,Nannan Wang,Fenghui Yuan,Xinhao Zhu,Lihua Zhang,Jingwei Zhang,Ying Sun,Ziyu Guo,Yuedong Guo,Xia Song,Changchun Song,Xiaofeng Xu
DOI: https://doi.org/10.3390/rs13122242
IF: 5
2021-06-08
Remote Sensing
Abstract:The net ecosystem CO2 exchange (NEE) is a critical parameter for quantifying terrestrial ecosystems and their contributions to the ongoing climate change. The accumulation of ecological data is calling for more advanced quantitative approaches for assisting NEE prediction. In this study, we applied two widely used machine learning algorithms, Random Forest (RF) and Extreme Gradient Boosting (XGBoost), to build models for simulating NEE in major biomes based on the FLUXNET dataset. Both models accurately predicted NEE in all biomes, while XGBoost had higher computational efficiency (6~62 times faster than RF). Among environmental variables, net solar radiation, soil water content, and soil temperature are the most important variables, while precipitation and wind speed are less important variables in simulating temporal variations of site-level NEE as shown by both models. Both models perform consistently well for extreme climate conditions. Extreme heat and dryness led to much worse model performance in grassland (extreme heat: R2 = 0.66~0.71, normal: R2 = 0.78~0.81; extreme dryness: R2 = 0.14~0.30, normal: R2 = 0.54~0.55), but the impact on forest is less (extreme heat: R2 = 0.50~0.78, normal: R2 = 0.59~0.87; extreme dryness: R2 = 0.86~0.90, normal: R2 = 0.81~0.85). Extreme wet condition did not change model performance in forest ecosystems (with R2 changing −0.03~0.03 compared with normal) but led to substantial reduction in model performance in cropland (with R2 decreasing 0.20~0.27 compared with normal). Extreme cold condition did not lead to much changes in model performance in forest and woody savannas (with R2 decreasing 0.01~0.08 and 0.09 compared with normal, respectively). Our study showed that both models need training samples at daily timesteps of >2.5 years to reach a good model performance and >5.4 years of daily samples to reach an optimal model performance. In summary, both RF and XGBoost are applicable machine learning algorithms for predicting ecosystem NEE, and XGBoost algorithm is more feasible than RF in terms of accuracy and efficiency.
environmental sciences,imaging science & photographic technology,remote sensing,geosciences, multidisciplinary
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to predict site - level Net Ecosystem Exchange (NEE) in major biomes. Specifically, the research aims to: 1. **Compare two machine - learning algorithms**: namely Random Forest (RF) and Extreme Gradient Boosting (XGBoost), in terms of their performance in predicting site - level NEE. 2. **Evaluate the importance of environmental variables**: determine which environmental variables are most important for controlling NEE. 3. **Model performance under extreme climate conditions**: study the performance of these two models in predicting NEE under extreme climate conditions (such as extreme heat, cold, wet, and dry). 4. **Evaluate the change of model performance with sample size**: explore the impact of different training sample sizes on the performance of models in predicting NEE. ### Background and Significance Net Ecosystem Carbon Dioxide Exchange (NEE) is a key parameter for quantifying terrestrial ecosystems and their contribution to ongoing climate change. With the accumulation of ecological data, more advanced quantitative methods are required to assist in NEE prediction. In this paper, by applying two widely - used machine - learning algorithms, RF and XGBoost, models are constructed based on the FLUXNET data set to simulate NEE in major biomes. ### Main Findings 1. **Model Performance**: - In all biomes, both models can accurately predict NEE, but XGBoost has higher computational efficiency (6 - 62 times faster than RF). - The prediction effect in forest ecosystems (DBF, EBF, ENF, MF) is the best (R² ranges from 0.59 to 0.81), followed by grasslands (including SAV and WSA; R² ranges from 0.57 to 0.61), then grasslands (R² = 0.55), and finally farmlands (R² = 0.43). - The prediction results of different types of shrubs (OSH and CSH) vary greatly, and the R² of CSH (0.75) is much higher than that of OSH (0.35). 2. **Importance of Environmental Variables**: - Net solar radiation (NETRAD), soil temperature (TS), and atmospheric pressure (PA) are the most important variables. - Precipitation (P) and wind speed (WS) are less important in simulating site - level NEE over time. 3. **Model Performance under Extreme Climate Conditions**: - Extreme high temperature and drought significantly reduce the performance of grassland models (extreme high temperature: R² = 0.66 - 0.71, normal: R² = 0.78 - 0.81; extreme drought: R² = 0.14 - 0.30, normal: R² = 0.54 - 0.55). - Forests are less affected by extreme high temperature and drought (extreme high temperature: R² = 0.50 - 0.78, normal: R² = 0.59 - 0.87; extreme drought: R² = 0.86 - 0.90, normal: R² = 0.81 - 0.85). - Extreme wet conditions have little impact on the performance of forest ecosystem models (R² change - 0.03 - 0.03), but significantly reduce the performance of farmland models (R² decrease 0.20 - 0.27). - Extreme cold conditions have little impact on the performance of forest and shrub - grassland models (R² decrease 0.01 - 0.08 and 0.09 respectively). 4. **Impact of Sample Size on Model Performance**: - The model requires more than 2.5 years of daily data to achieve good performance and more than 5.4 years of daily data to achieve optimal performance. ### Conclusion The research shows that both RF and XGBoost are machine - learning algorithms suitable for predicting NEE in ecosystems, and XGBoost has more advantages in terms of accuracy and efficiency.