摘要
针对工业污水处理过程的复杂性和明显的非线性特征,关键的水质指标生化需氧量往往无法实时在线监测,提出了组合协方差的高斯过程回归(combinedcovariance Gaussian process regression,CGPR)建模方法来建立生化需氧量预测模型,捕捉污水数据中主导变量与辅助变量的线性和非线性关系。考虑污水处理过程中各个辅助变量之间的非线性和相关性,在模型建立之前采用随机森林(random forest,RF)算法给出辅助变量重要性评分,重要性评分高的变量作为CGPR模型的输入来提高模型的预测精度。最后,利用污水处理过程数据进行仿真研究。仿真结果表明,所提的RF-CGPR方法能够达到较好的预测效果,其中结合了线性协方差和Matern协方差的模型具有较高的预测精度。
The process of industrial sewage treatment is complex and has obvious nonlinear characteristics,and the key water quality index of biochemical oxygen demand cannot be monitored online in real time.To solve the problem,a modeling method of combined covariance Gaussian process regression(CGPR)is proposed in this paper to establish the prediction model of biochemical oxygen demand to capture the linear and nonlinear relationship between the leading and auxiliary variables in the sewage data.Considering the nonlinearity and correlation among the auxiliary variables in the sewage treatment process,the random forest(RF)algorithm is used to give the importance measure of the auxiliary variables before the model is established.Then,the variables with high importance measure are selected as the CGPR model input to improve the prediction accuracy of the model.Finally,the the sewage treatment process data are used for simulation research.The simulation results show that the proposed RF-CGPR method can achieve good prediction results,and the model combined with linear covariance and Matern covariance has higher prediction accuracy.
作者
孙顺远
刘康康
SUN Shun-yuan;LIU Kang-kang(School of Internet of Things Engineering,Jiangnan University Wuxi 214122,China;Key Laboratory of Advanced Process Control for Light Industry of Ministry of Education,Jiangnan University Wuxi 214122,China)
出处
《控制工程》
CSCD
北大核心
2021年第12期2336-2342,共7页
Control Engineering of China
基金
国家自然科学基金资助项目(61773182)
江苏高校优势学科建设工程资助项目(PAPD)。
关键词
变量选择
随机森林
组合协方差
高斯过程回归
污水处理
Variable selection
random forest
combined covariance
Gaussian process regression
sewage treatment