摘要
PM_(2.5)和O_(3)是两种常见的空气污染物,目前针对空气污染的预报解析方法有很多,但大多存在分类器训练不足、过拟合、可解释性较差等问题.为了解决这些问题,本文使用梯度提升算法和SHAP对PM_(2.5)和O3浓度进行预测分析,利用2015-01-01—2021-12-31石家庄市的大气污染物数据和气象数据,分析了不同梯度提升树算法(LightGBM、GBR、XGBoost)的预测精度.实验结果表明,XGBoost对PM_(2.5)的预测精度高于LightGBM和GBR模型,LightGBM对O3的预测精度在评价指标上均优于GBR和XGBoost模型.最后通过模型解释方法SHAP,分别识别出影响PM_(2.5)和O3浓度的关键因素.研究表明PM10、CO和SO2对PM_(2.5)的浓度影响显著,NO_(2)、TEMP对O_(3)浓度贡献较大.
PM_(2.5)and O_(3)are two common air pollutants,and numerous methods exist in current academia for forecasting and analyzing air pollution.Still,most have problems such as insufficiently trained classifiers,overfitting,and poor interpretability.As a response to these issues,this paper applies the gradient boosting algorithm and SHAP to predict and analyze PM_(2.5)and O_(3)concentrations and observes the prediction accuracy of different gradient boosting tree algorithms(LightGBM,GBR,and XGBoost)using the air pollutant data and meteorological data of Shijiazhuang city from January 1,2015 to December 31,2021.The experimental results show that XGBoost has a higher prediction accuracy for PM_(2.5)when compared to LightGBM and GBR models.Additionally,LightGBM has a better prediction accuracy for O_(3)than GBR and XGBoost models in terms of evaluation indexes.Finally,SHAP,a model interpretation tool,identifies the key factors affecting PM_(2.5)and O_(3)concentrations.The study indicates that PM_(10),CO,and SO_(2)significantly affect PM_(2.5)concentration,while NO_(2)and TEMP have a more prominent impact on O_(3)concentration.
作者
潘梦瑶
任瑛
王思源
夏必胜
PAN Mengyao;REN Ying;WANG Siyuan;XIA Bisheng(School of Mathematics and Computer Science,Yan'an University,Yan'an 716000)
出处
《环境科学学报》
CAS
CSCD
北大核心
2024年第7期402-409,共8页
Acta Scientiae Circumstantiae
基金
延安市科技局项目(No.203010096)
延安大学校级项目(No.205040306)。