Model bias in calculating factor importance of climate on vegetation growth
Boyi Liang,Hongyan Liu,Shaopeng Wang,Elizabeth L. Cressey,Cecilia A.L. Dahlsjö,Chongyang Xu,Jia Wang,Zong Wang,Feng Liu,Siwen Feng,Liang Shi,Jingyu Dai,Jing Cao,Fufu Li,Timothy A. Quine
DOI: https://doi.org/10.1016/j.gloplacha.2023.104209
IF: 4.956
2023-08-07
Global and Planetary Change
Abstract:Machine learning is increasingly used to study vegetation growth, however, more often than not, predicting and simulating functions are prioritized over quantitative estimates of the drivers of vegetation growth such as climate. In this paper, we, for the first time, systematically investigate the model bias in calculating factor importance of climate on vegetation growth, especially when various kinds of machine learning models are considered. We undertake two case studies to simulate research in remote sensing and ground-based scenarios from which the difference in quantitative relationships between climate and vegetation is evaluated across multiple models. We found that model complexity increased the determination coefficient (R 2 ) but reduced the absolute importance of the preselected independent variables. As the fitting accuracy increases, the absolute factor importance of dominant factor and all the other influencing factors decreases simultaneously, and factor importance calculated by different models tended to be more normally distributed across the study region. The reduction in factor importance was accompanied by the increased effect of model selection; e.g. the model that was used to estimate vegetation growth played a larger role in producing the factor importance (shown by variance analysis, remote sensing scenario, F-statistic = 555.2; ground based scenario, F-statistic = 30.8) than the climate variables (variance analysis, remote sensing scenario, F = 460.8; ground based scenario, F = 28.8). Critically, for those machine learning models with highest fitting accuracy, the resultant factor importance of climate factors had smaller difference with that of random factor. In contrast, the relative factor influence among the selected climate factor is more robust and reliable (in variation analysis, model was detected no significant impact on the resultant factor importance). For 5 of 8 models, the dominant factor (temperature) has relative influence over 0.85, ranging from 0.88 to 0.99. According to the relevant result, we suggest testing the stability of factor contribution in future studies, particularly when using machine learning models in ecological research and dealing with numerous factors, before drawing relative conclusions. The balance between simple and accurate models is contested and we believe that our study will contribute to a better understand of the data behind this debate.
geosciences, multidisciplinary,geography, physical