期刊文献+

基于特征优选和机器学习的塔里木盆地东缘绿洲土壤镉元素含量预测及健康风险评价

Soil Cadmium Prediction and Health Risk Assessment of an Oasis on the Eastern Edge of the Tarim Basin Based on Feature Optimization and Machine Learning
原文传递
导出
摘要 土壤重金属污染对粮食安全、人类健康和土壤生态系统均造成重大威胁.基于塔里木盆地东缘典型绿洲区获取的644个土壤样品,运用多元线性回归(LR)、神经网络(BP)、随机森林(RF)、支持向量机(SVM)和基于径向基函数神经网络(RBF)方法构建土壤重金属预测模型,利用最优预测结果分析重金属污染的空间分布特征与健康风险.结果表明:①研究区ω(Cd)均值为0.14 mg·kg^(-1),是新疆土壤背景值的1.17倍,是区内土壤重金属污染的主要因子;区内成人和儿童Cd元素致癌风险系数均小于10~(-4),对人类无明显的长期健康风险影响.②对比5种反演模型的预测精度,RF模型验证集R^(2)值为0.7637,在5种模型中最大;且其RMSE、MAE和MBE值在5种模型中最小,土壤Cd元素实测值与RF模型的预测值拟合效果最佳.同时,基于RF模型的研究区土壤Cd含量空间分布预测结果与实测样点插值结果具有较好的一致性.③在土壤Cd元素健康风险预测中,RF模型对成人与儿童的反演精度均优于其他4种模型,预测结果较好;LR模型验证集预测值变化幅度大,预测结果较差.综上,RF模型具有较好的泛化能力和抗过拟合能力,为研究区土壤Cd含量预测和健康风险评价的最优模型. Soil heavy metal pollution poses a serious threat to food security,human health,and soil ecosystems.Based on 644 soil samples collected from a typical oasis located at the eastern margin of the Tarim Basin,a series of models,namely,multiple linear regression(LR),neural network(BP),random forest(RF),support vector machine(SVM),and radial basis function(RBF),were built to predict the soil heavy metal content.The optimal prediction result was obtained and utilized to analyze the spatial distribution features of heavy metal contamination and relevant health risks.The outcomes demonstrated that:①The average Cd content in the study area was 0.14 mg·kg^(-1),which was 1.17 times the soil background value of Xinjiang,making it the primary factor of soil heavy metal contamination in the area.Additionally,the carcinogenicity risk coefficients of Cd for both adults and children were less than 10-4,indicating that there were no significant long-term health risks for humans in the area.②The estimation accuracies of the five inversion models were compared,and the validation set of the RF model had an R2 value of 0.7637,which was the highest among the five models.Additionally,the RMSE,MAE,and MBE of the RF model were the smallest among the five models.Therefore,the predicted values of the RF model were most consistent with the measured values of the soil Cd content.The predicted map of soil Cd distribution derived from the RF model coincided best with the interpolation map.③The RF model outperformed the other four models in predicting health risks associated with the soil Cd element for both adults and children,resulting in better prediction results.Comparatively,the predicted values of the LR model in the validation set varied greatly,leading to unreliable results.It was demonstrated that the RF was the best model for predicting soil Cd content and evaluating health risks in the study area,considering its superior generalization capability and anti-overfitting ability.
作者 刘靖宇 李若怡 梁永春 刘磊 尹芳 唐塑 何林森 张毅 LIU Jing-yu;LI Ruo-yi;LIANG Yong-chun;LIU Lei;YIN Fang;TANG Su;HE Lin-sen;ZHANG Yi(School of Earth Science and Resources,Chang'an University,Xi'an 710054,China;Center of Urumqi Comprehensive Survey Natural Resources,China Geological Survey,Urumqi 830057,China;China Aero Geophysical Survey and Remote Sensing Center for Natural Resources,Beijing 100083,China;School of Land Engineering,Chang'an University,Xi'an 710054,China;Xi'an Mineral Resources Survey Centre,China Geological Survey,Xi'an 710100,China)
出处 《环境科学》 EI CAS CSCD 北大核心 2024年第8期4802-4811,共10页 Environmental Science
基金 国家自然科学基金项目(42071258) 中央高校基本科研业务费项目(300102353501,300202222009) 中国地质调查局项目(DD20191026)。
关键词 镉(CD) 含量预测 健康风险评价 机器学习 特征优选 cadmium(Cd) content prediction health risk assessment machine learning feature optimization
  • 相关文献

参考文献18

二级参考文献396

共引文献294

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部