Air quality prediction is an important part of environmental governance.The accuracy of the air quality prediction also affects the planning of people’s outdoor activities.How to mine effective information from histo...Air quality prediction is an important part of environmental governance.The accuracy of the air quality prediction also affects the planning of people’s outdoor activities.How to mine effective information from historical data of air pollution and reduce unimportant factors to predict the law of pollution change is of great significance for pollution prevention,pollution control and pollution early warning.In this paper,we take into account that there are different trends in air pollutants and that different climatic factors have different effects on air pollutants.Firstly,the data of air pollutants in different cities are collected by a sliding window technology,and the data of different cities in the sliding window are clustered by Kohonen method to find the same tends in air pollutants.On this basis,combined with the weather data,we use the ReliefF method to extract the characteristics of climate factors that helpful for prediction.Finally,different types of air pollutants and corresponding extracted the characteristics of climate factors are used to train different sub models.The experimental results of different algorithms with different air pollutants show that this method not only improves the accuracy of air quality prediction,but also improves the operation efficiency.展开更多
Air pollution is a severe environmental problem in urban areas.Accurate air quality prediction can help governments and individuals make proper decisions to cope with potential air pollution.As a classic time series f...Air pollution is a severe environmental problem in urban areas.Accurate air quality prediction can help governments and individuals make proper decisions to cope with potential air pollution.As a classic time series forecasting model,the AutoRegressive Integrated Moving Average(ARIMA)has been widely adopted in air quality prediction.However,because of the volatility of air quality and the lack of additional context information,i.e.,the spatial relationships among monitor stations,traditional ARIMA models suffer from unstable prediction performance.Though some deep networks can achieve higher accuracy,a mass of training data,heavy computing,and time cost are required.In this paper,we propose a hybrid model to simultaneously predict seven air pollution indicators from multiple monitoring stations.The proposed model consists of three components:(1)an extended ARIMA to predict matrix series of multiple air quality indicators from several adjacent monitoring stations;(2)the Empirical Mode Decomposition(EMD)to decompose the air quality time series data into multiple smooth sub-series;and(3)the truncated Singular Value Decomposition(SvD)to compress and denoise the expanded matrix.Experimental results on the public dataset show that our proposed model outperforms the state-of-art air quality forecasting models in both accuracy and time cost.展开更多
Given the increasing number of countries reporting degraded air quality,effective air quality monitoring has become a critical issue in today’s world.However,the current air quality observatory systems are often proh...Given the increasing number of countries reporting degraded air quality,effective air quality monitoring has become a critical issue in today’s world.However,the current air quality observatory systems are often prohibitively expensive,resulting in a lack of observatories in many regions within a country.Consequently,a significant problem arises where not every region receives the same level of air quality information.This disparity occurs because some locations have to rely on information from observatories located far away from their regions,even if they may be the closest available options.To address this challenge,a novel approach that leverages machine learning and deep learning techniques to forecast fine dust concentrations was proposed.Specifically,continuous location features in the form of latitude and longitude values were incorporated into our models.By utilizing a comprehensive dataset comprising weather conditions,air quality measurements,and location properties,various machine learning models,including Random Forest Regression,XGBoost Regression,AdaBoost Regression,and a deep learning model known as Long Short-Term Memory(LSTM)were trained.Our experimental results demonstrated that the LSTM model outperforms the other models,achieving the best score with a root mean squared error of 23.48 in predicting fine dust(PM10)concentrations on an hourly basis.Furthermore,the fact that incorporating location properties,such as longitude and latitude values,enhances the overall quality of the regression models was discovered.Additionally,the implications and contributions of our research were discussed.By implementing our approach,the cost associated with relying solely on existing observatories can be substantially reduced.This reduction in costs can pave the way for economically efficient fine dust observation systems,ensuring more widespread and accurate air quality monitoring across different regions.展开更多
Nitrogen dioxide(NO_(2))poses a critical potential risk to environmental quality and public health.A reliable machine learning(ML)forecasting framework will be useful to provide valuable information to support governm...Nitrogen dioxide(NO_(2))poses a critical potential risk to environmental quality and public health.A reliable machine learning(ML)forecasting framework will be useful to provide valuable information to support government decision-making.Based on the data from1609 air quality monitors across China from 2014-2020,this study designed an ensemble ML model by integrating multiple types of spatial-temporal variables and three sub-models for time-sensitive prediction over a wide range.The ensemble ML model incorporates a residual connection to the gated recurrent unit(GRU)network and adopts the advantage of Transformer,extreme gradient boosting(XGBoost)and GRU with residual connection network,resulting in a 4.1%±1.0%lower root mean square error over XGBoost for the test results.The ensemble model shows great prediction performance,with coefficient of determination of 0.91,0.86,and 0.77 for 1-hr,3-hr,and 24-hr averages for the test results,respectively.In particular,this model has achieved excellent performance with low spatial uncertainty in Central,East,and North China,the major site-dense zones.Through the interpretability analysis based on the Shapley value for different temporal resolutions,we found that the contribution of atmospheric chemical processes is more important for hourly predictions compared with the daily scale predictions,while the impact of meteorological conditions would be ever-prominent for the latter.Compared with existing models for different spatiotemporal scales,the present model can be implemented at any air quality monitoring station across China to facilitate achieving rapid and dependable forecast of NO_(2),which will help developing effective control policies.展开更多
In the age of big data,the Internet big data can finely reflect public attention to air pollution,which greatly impact ambient PM2.5 concentrations;however,it has not been applied to PM2.5 prediction yet.Therefore,thi...In the age of big data,the Internet big data can finely reflect public attention to air pollution,which greatly impact ambient PM2.5 concentrations;however,it has not been applied to PM2.5 prediction yet.Therefore,this study introduces such informative Internet big data as an effective predictor for PM2.5,in addition to other big data.To capture the multi-scale relationship between PM2.5 concentrations and multi-source big data,a novel multi-source big data and multi-scale forecasting methodology is proposed for PM2.5.Three major steps are taken:1)Multi-source big data process,to collect big data from different sources(e.g.,devices and Internet)and extract the hidden predictive features;2)Multi-scale analysis,to address the non-uniformity and nonalignment of timescales by withdrawing the scale-aligned modes hidden in multi-source data;3)PM2.5 prediction,entailing individual prediction at each timescale and ensemble prediction for the final results.The empirical study focuses on the top highly-polluted cities and shows that the proposed multi-source big data and multi-scale forecasting method outperforms its original forms(with neither big data nor multi-scale analysis),semi-extended variants(with big data and without multi-scale analysis)and similar counterparts(with big data but from a single source and multi-scale analysis)in accuracy.展开更多
基金This research was supported in part by the National Natural Science Foundation of China under grant Nos.61602202 and 61603146the Natural Science Foundation of Jiangsu Province under contracts BK20160428 and BK20160427+1 种基金the Six talent peaks project in Jiangsu Province under contract XYDXX-034the project in Jiangsu Association for science and technology.
文摘Air quality prediction is an important part of environmental governance.The accuracy of the air quality prediction also affects the planning of people’s outdoor activities.How to mine effective information from historical data of air pollution and reduce unimportant factors to predict the law of pollution change is of great significance for pollution prevention,pollution control and pollution early warning.In this paper,we take into account that there are different trends in air pollutants and that different climatic factors have different effects on air pollutants.Firstly,the data of air pollutants in different cities are collected by a sliding window technology,and the data of different cities in the sliding window are clustered by Kohonen method to find the same tends in air pollutants.On this basis,combined with the weather data,we use the ReliefF method to extract the characteristics of climate factors that helpful for prediction.Finally,different types of air pollutants and corresponding extracted the characteristics of climate factors are used to train different sub models.The experimental results of different algorithms with different air pollutants show that this method not only improves the accuracy of air quality prediction,but also improves the operation efficiency.
文摘Air pollution is a severe environmental problem in urban areas.Accurate air quality prediction can help governments and individuals make proper decisions to cope with potential air pollution.As a classic time series forecasting model,the AutoRegressive Integrated Moving Average(ARIMA)has been widely adopted in air quality prediction.However,because of the volatility of air quality and the lack of additional context information,i.e.,the spatial relationships among monitor stations,traditional ARIMA models suffer from unstable prediction performance.Though some deep networks can achieve higher accuracy,a mass of training data,heavy computing,and time cost are required.In this paper,we propose a hybrid model to simultaneously predict seven air pollution indicators from multiple monitoring stations.The proposed model consists of three components:(1)an extended ARIMA to predict matrix series of multiple air quality indicators from several adjacent monitoring stations;(2)the Empirical Mode Decomposition(EMD)to decompose the air quality time series data into multiple smooth sub-series;and(3)the truncated Singular Value Decomposition(SvD)to compress and denoise the expanded matrix.Experimental results on the public dataset show that our proposed model outperforms the state-of-art air quality forecasting models in both accuracy and time cost.
基金This research was supported by the MSIT(Ministry of Science and ICT),Korea,under the ICAN(ICT Challenge and Advanced Network of HRD)Program(IITP-2020-0-01816)supervised by the IITP(Institute of Information&Communications Technology Planning&Evaluation)This research was also supported by National Research Foundation(NRF)of Korea Grant funded by the Korean Government(MSIT)(No.2021R1A4A3022102).
文摘Given the increasing number of countries reporting degraded air quality,effective air quality monitoring has become a critical issue in today’s world.However,the current air quality observatory systems are often prohibitively expensive,resulting in a lack of observatories in many regions within a country.Consequently,a significant problem arises where not every region receives the same level of air quality information.This disparity occurs because some locations have to rely on information from observatories located far away from their regions,even if they may be the closest available options.To address this challenge,a novel approach that leverages machine learning and deep learning techniques to forecast fine dust concentrations was proposed.Specifically,continuous location features in the form of latitude and longitude values were incorporated into our models.By utilizing a comprehensive dataset comprising weather conditions,air quality measurements,and location properties,various machine learning models,including Random Forest Regression,XGBoost Regression,AdaBoost Regression,and a deep learning model known as Long Short-Term Memory(LSTM)were trained.Our experimental results demonstrated that the LSTM model outperforms the other models,achieving the best score with a root mean squared error of 23.48 in predicting fine dust(PM10)concentrations on an hourly basis.Furthermore,the fact that incorporating location properties,such as longitude and latitude values,enhances the overall quality of the regression models was discovered.Additionally,the implications and contributions of our research were discussed.By implementing our approach,the cost associated with relying solely on existing observatories can be substantially reduced.This reduction in costs can pave the way for economically efficient fine dust observation systems,ensuring more widespread and accurate air quality monitoring across different regions.
基金supported by the Taishan Scholars (No.ts201712003)。
文摘Nitrogen dioxide(NO_(2))poses a critical potential risk to environmental quality and public health.A reliable machine learning(ML)forecasting framework will be useful to provide valuable information to support government decision-making.Based on the data from1609 air quality monitors across China from 2014-2020,this study designed an ensemble ML model by integrating multiple types of spatial-temporal variables and three sub-models for time-sensitive prediction over a wide range.The ensemble ML model incorporates a residual connection to the gated recurrent unit(GRU)network and adopts the advantage of Transformer,extreme gradient boosting(XGBoost)and GRU with residual connection network,resulting in a 4.1%±1.0%lower root mean square error over XGBoost for the test results.The ensemble model shows great prediction performance,with coefficient of determination of 0.91,0.86,and 0.77 for 1-hr,3-hr,and 24-hr averages for the test results,respectively.In particular,this model has achieved excellent performance with low spatial uncertainty in Central,East,and North China,the major site-dense zones.Through the interpretability analysis based on the Shapley value for different temporal resolutions,we found that the contribution of atmospheric chemical processes is more important for hourly predictions compared with the daily scale predictions,while the impact of meteorological conditions would be ever-prominent for the latter.Compared with existing models for different spatiotemporal scales,the present model can be implemented at any air quality monitoring station across China to facilitate achieving rapid and dependable forecast of NO_(2),which will help developing effective control policies.
基金supported by the National Natural Science Foundation of China under Grant Nos.72004144and 71971007the Fundamental Research Funds for the Beijing Municipal Colleges and Universities in Capital University of Economics and Business under Grant No.XRZ2020026。
文摘In the age of big data,the Internet big data can finely reflect public attention to air pollution,which greatly impact ambient PM2.5 concentrations;however,it has not been applied to PM2.5 prediction yet.Therefore,this study introduces such informative Internet big data as an effective predictor for PM2.5,in addition to other big data.To capture the multi-scale relationship between PM2.5 concentrations and multi-source big data,a novel multi-source big data and multi-scale forecasting methodology is proposed for PM2.5.Three major steps are taken:1)Multi-source big data process,to collect big data from different sources(e.g.,devices and Internet)and extract the hidden predictive features;2)Multi-scale analysis,to address the non-uniformity and nonalignment of timescales by withdrawing the scale-aligned modes hidden in multi-source data;3)PM2.5 prediction,entailing individual prediction at each timescale and ensemble prediction for the final results.The empirical study focuses on the top highly-polluted cities and shows that the proposed multi-source big data and multi-scale forecasting method outperforms its original forms(with neither big data nor multi-scale analysis),semi-extended variants(with big data and without multi-scale analysis)and similar counterparts(with big data but from a single source and multi-scale analysis)in accuracy.