摘要
物种生境模型预测结果通常是概率性的,然而在具体的保护管理等实践应用过程中通常需要基于二元值(存在/不存在)的分布图,此时就需要把概率性的预测结果转化为二元值,在此转化过程中就涉及阈值选择问题。此外,在评估模型预测准确度的时候,多数评估指标也需要选择一个阈值用于转化概率预测结果,这个阈值选择对于模型预测准确度也会有极大的影响。然而阈值选择却是物种生境模拟不确定性研究中较少涉及的领域。"随机森林"既可以生成物种生境概率分布图(回归算法)也可以生成二元分布图(分类算法),然而还未见对两种预测方式的比较研究。该文以珙桐(Davidia involucrata)和杉木(Cunninghamia lanceolata)为例,分别采用"随机森林"的分类算法和回归算法预测其生境二元分布图和概率分布图,通过4个不同阈值选择方法(默认值0.5、MaxKappa、MaxTSS和MaxACC)把概率预测图转换为二元分布图,进而比较分析转换结果对模型预估的影响。珙桐不同阈值选择方法所确立的阈值之间存在显著差异,而杉木没有显著差异;两物种模型准确度之间没有显著差异;在预测两物种未来气候条件下的生境面积变化、生境分布区迁移方向和距离以及最适宜海拔分布高度变化时,二元值转换后的回归算法与分类算法之间存在显著差异,但回归算法中各阈值选择方法之间没有显著差异。空间生境分布图的相似性分析表明MaxKappa和MaxTSS法具有最大相似性,分类算法与4种阈值选择方法之间具有最大差异。
Aims Predictive species distribution models (SDMs) are increasingly applied in resource assessment, environ- mental conservation and biodiversity management. However, most SDM models often yield a predicted probabil- ity (suitability) surface map. In conservation and environmental management practices, the information presented as species presence/absence (binary) may be more practical than presented as probability or suitability. Therefore, a threshold is needed to transform the probability or suitability data to presence/absence data. However, little is known about the effects of different threshold-selection methods on model performance and species range changes induced by future climate. Of the numerous SDM models, random forest (RF) can produce probabilistic and binary species distribution maps based on its regression and classification algorisms, respectively. Studies dealing with the comparative test of the performances of RF regression and classification algorisms have not been reported.Methods Here, the RF was used to simulate the current and project the future potential distributions of Davidia involucrata and Cunninghamia lanceolata. Then, MaxTSS and MaxACC) were selected and used to four threshold-setting methods (Default 0.5, MaxKappa, transform modelled probabilities of occurrence into binary predictions of species presence and absence. Lastly, we investigated the difference in model performance among the threshold selection methods by using five model accuracy measures (Kappa, TSS, Overall accuracy, Sensitiv- ity and Specificity). We also used the map similarity measure, Kappa, for a cell-by-cell comparison of similarities and differences of distribution map under current and future climates. Important findings We found that the choice of threshold method altered estimates of model performance, spe- cies habitat suitable area and species range shifts under future climate. The difference in selected threshold cut-offs among the four threshold methods was significant for D. involucrata, but was not significant for C. lanceolata. Species' geographic ranges changed (area change and shifting distance) in response to climate change, but the projections of the four threshold methods did not differ significantly with respect to how much or in which direction, but they did differ against RF classification predictions. The pairwise similarity analysis of binary maps indicated that spatial correspondence among prediction maps was the highest between the MaxKappa and the MaxTSS, and lowest between RF classification algorism and the four threshold-setting methods. We argue that the MaxTSS and the MaxKappa are promising methods for threshold selection when RF regression algorism is used for the distribution modeling of species. This study also provides promising insights to our understanding of the uncertainty of threshold selection in species distribution modeling.
出处
《植物生态学报》
CAS
CSCD
北大核心
2017年第4期387-395,共9页
Chinese Journal of Plant Ecology
基金
国家自然科学基金(41301056)
中央公益性院所基本科研业务专项(CAFYBB2014QB006和RIF2012-04)
林业软科学项目(2016-R21)
关键词
阈值
概率生境图
二元生境图
随机森林
珙桐
杉木
threshold
probability habitat map
binary habitat map
random forest
Davidia involucrata
Cun-ninghamia lanceolata