An Evaluation of Four Threshold Selection Methods in Species Occurrence Modelling with Random Forest: Case Studies with Davidia Involucrata and Cunninghamia Lanceolata
ZHANG Lei,WANG Lin-lin,LIU Shi-Rong,SUN Peng-Sen,YU Zhen,HUANG Shu-Tao,ZHANG Xu-Dong
DOI: https://doi.org/10.17521/cjpe.2016.0184
2017-01-01
Abstract:Aims Predictive species distribution models (SDMs) are increasingly applied in resource assessment,environmental conservation and biodiversity management.However,most SDM models often yield a predicted probability (suitability) surface map.In conservation and environmental management practices,the information presented as species presence/absence (binary) may be more practical than presented as probability or suitability.Therefore,a threshold is needed to transform the probability or suitability data to presence/absence data.However,little is known about the effects of different threshold-selection methods on model performance and species range changes induced by future climate.Of the numerous SDM models,random forest (RF) can produce probabilistic and binary species distribution maps based on its regression and classification algorisms,respectively.Studies dealing with the comparative test of the performances of RF regression and classification algorisms have not been reported.Methods Here,the RF was used to simulate the current and project the future potential distributions of Davidia involucrata and Cunninghamia lanceolata.Then,four threshold-setting methods (Default 0.5,MaxKappa,MaxTSS and MaxACC) were selected and used to transform modelled probabilities of occurrence into binary predictions of species presence and absence.Lastly,we investigated the difference in model performance among the threshold selection methods by using five model accuracy measures (Kappa,TSS,Overall accuracy,Sensitivity and Specificity).We also used the map similarity measure,Kappa,for a cell-by-cell comparison of similarities and differences of distribution map tnder current and future climates.Important findings We found that the choice of threshold method altered estimates of model performance,species habitat suitable area and species range shifts under future climate.The difference in selected threshold cut-offs among the four threshold methods was significant for D.involucrata,but was not significant for C.lanceolata.Species' geographic ranges changed (area change and shifting distance) in response to climate change,but the projections of the four threshold methods did not differ significantly with respect to how much or in which direction,but they did differ against RF classification predictions.The pairwise similarity analysis of binary maps indicated that spatial correspondence among prediction maps was the highest between the MaxKappa and the MaxTSS,and lowest between RF classification algorism and the four threshold-setting methods.We argue that the MaxTSS and the MaxKappa are promising methods for threshold selection when RF regression algorism is used for the distribution modeling of species.This study also provides promising insights to our understanding of the uncertainty of threshold selection in species distribution modeling.