摘要
采用GRNN(Generalized Regression Neural Network)和RF(Random Forest)2种机器学习方法构建土壤有机质预测模型,以提高稀疏样本情况下的土壤有机质估算精度。依据北京市大兴区农用地2007年的土壤有机质采样数据,按MMSD准则(Minimization of the Mean of the Shortest Distances)抽稀为8种不同采样密度的样本(分别为2703、1352、676、339、169、85、43、22个样本),分别采用GRNN、RF和Ordinary kriging对各采样密度下的未知采样点进行预测,采用交叉检验的方式验证各采样密度下未知样点的预测精度。随着采样点密度的下降,样点间的空间自相关性逐渐减弱,半变异函数的拟和精度变差,预测点结果误差增大,预测的置信度降低。当抽稀到43个和22个采样点时,样点间的空间自相关性接近歼灭,半变异函数的决定系数较低且残差较大。普通克里格受到采样点数量和采样密度、样点的空间结构的影响比较明显,其预测精度随采样点数量的下降而下降。在85个采样点及以下时,其预测值与观测值之间没有显著的相关性。GRNN和RF的预测精度受采样密度的影响不大,其预测精度在一个较小的范围内波动,其预测值围绕观测值在一定阈值空间内震荡波动,具有较好的相关性,在85个及以下的采样密度时,预测精度相对普通克里格有较大的提升。普通克里格法不适合在稀疏样本条件下空间插值计算,尤其是在空间自相关性比较弱的情况下。机器学习模型能充分学习土壤间环境信息、样点空间邻近效应信息,兼顾属性相似性和空间自相关,具有更好的稳定性和适应性,不容易受到采样点数量、构型和采样密度等因素的影响,即使在采样点空间自相关性很弱的情况下也能做出稳定预测精度。
To improve the accuracy of soil organic estimation in the case of sparse samples and to construct the soil organic predictive models applying the machine learning methods,GRNN(Generalized Regression Neural Network)and RF(Random Forest).The soil was diluted into 8 samples with different sampling density(2703,1352,676,339,169,85,43,22 samples)according to the soil organic matter sampling data of Daxing agricultural land in 2007 applying the MMSD(Minimization of the Mean of the Shortest Distances)criterion.GRNN(Generalized Regression Neural Network),RF(random forest)and Ordinary Kriging are applied to predict each sampling density espectively.Cross Validation is used to verify the prediction accuracy of unknown samples at each sampling density.With the decrease of sampling point density,the spatial correlation between sampling points decreases gradually,thus the semivariogram’s fitting precision deteriorates,the errorofprediction point result increases,and the confidence of the prediction decreases.The spatial correlation between sampling points is close to disappear when the sample is diluted under 43 and 22 samples,and the coefficient of determination of the semivariogram function is low and the residual is large.The impacts the Ordinary Kriging receives,which are from the changes in the number of the sampling points,sampling density and spatial structures of samples is obvious.The prediction accuracy of the method decreases with the decrease of the number of sampling points.There is no significant correlation between the predicted values and the observed values at or below 85 sampling points.The prediction accuracy of GRNN and RF is almost independent of the sampling density.The predicted values fluctuate within a certain threshold space around the observed values,and has good correlation.At sampling points of 85 and below,the prediction accuracy is greatly improved compared with Ordinary Kriging.Ordinary Kriging is not suitable for spatial interpolating calculation in the case of sparse samples,especially in the case of weak spatial correlation.The machine learning models can fully learn the environmental information and spatial proximity information of soil sampling points.They combine attribute similarity and spatial correlation and have better stability and adaptability,not being easy to be affected by the number of sampling points,configuration and sampling density,and can make stable and accurate predictions even when the spatial autocorrelation between sampling points is very weak.
作者
刘明杰
徐卓揆
郜允兵
杨晶
潘瑜春
高秉博
周艳兵
周万鹏
王凌
LIU Mingjie;XU Zhuokui;GAO Yunbing;YANG Jing;PAN Yuchun;GAO Bingbo;ZHOU Yanbing;ZHOU Wanpeng;WANG Ling(School of Traffic and Transportation Engineering,Changsha University of Science and Technology,Changsha 410114,China;Beijing Research Center for Information Technology in Agriculture,Beijing 100097,China;Engineering Laboratory of Spatial Information Technology of Highway Geological Disaster Early Warning in Hunan Province(Changsha University of Science&Technology),Changsha 410114,China;National Engineering Research Center for Information Technology in Agriculture,Beijing 100097,China;China Agricultural University,Beijing 100083,China;Henan Polytechnic University,Jiaozuo 454003,China;Institute of Agricultural Resources and Environment,Hebei Academy of Agriculture and Forestry Sciences,Shijiazhuang 050051,China)
出处
《地球信息科学学报》
CSCD
北大核心
2020年第9期1799-1813,共15页
Journal of Geo-information Science
基金
国家重点研发计划课题(2017YFD0801205)
北京市农林科学院科技创新能力建设专项(KJCX20170407、KJCX20200414)
湖南省教育厅资助科研项目(13B129)
湖南省工程实验室开放基金资助项目(KFJ180602)。
关键词
土壤有机质
空间插值
机器学习
属性相似性
空间自相关
大兴区
稀疏样本
采样密度
soil organic matter
spatial interpolation
machine learning
attribute similarity
spatial correlation
Daxing County
sparse sample
sampling density