摘要
基于实际电厂的大量脱硫数据,删除初始脱硫数据库中异常值和非稳态值,提取与输出相关系数较高的集成学习模型输入参数,采用改进的基于随机采样和聚类采样的集成学习算法,建立预测脱硫塔循环泵开启台数的集成学习模型,研究分类问题中样本不均衡、优选样本评价标准缺失和脱硫优化的问题.结果显示,与改进前模型相比,改进后的集成学习模型总体预测准确度提升了33%,并且基于聚类的采样略优于随机采样.此外,对单一类别预测的召回率进行分析,对比不同算法对少数类和多数类的召回率,结果显示2种改进的采样方法对少数类的预测有较大的提升,预测的召回率大于90%,对多数类的预测也有一定的提升效果.讨论泵组合作为模型输出时,其样本分布和模型精度的差异.
The ensemble learning approach based on random sampling or cluster sampling was developed to predict the number of desulfurization tower circulating pumps opened.The database was constructed from a realistic power plants,the outliers and unsteady values in the initial one were deleted,and the input parameters of the ensemble learning model with high correlation coefficients with the output were extracted.The problems of imbalanced samples in classification,missing evaluation criteria for optimal samples and desulfurization optimization were solved.Results showed that the improved ensemble learning model had a 33%increase in overall prediction accuracy compared with the original model.In addition,the cluster sampling was slightly better than the random sampling.Furthermore,the recall of a single category prediction was analyzed,and the recall values of different algorithms for the minority category and the majority category were compared.Results showed that the two improved sampling methods had greatly improved the minority category prediction,and the recall reached more than 90%,besides it also had certain improvement on the majority.Finally,the difference in sample distribution and model accuracy was discussed when the pump combination was used as the model’s output.
作者
葛志辉
邢江宽
罗坤
樊建人
GE Zhi-hui;XING Jiang-kuan;LUO Kun;FAN Jian-ren(State Key Laboratory of Clean Energy Utilization,Zhejiang University,Hangzhou 310027,China)
出处
《浙江大学学报(工学版)》
EI
CAS
CSCD
北大核心
2021年第8期1566-1575,共10页
Journal of Zhejiang University:Engineering Science
基金
国家重点研发计划资助项目(2017YFB0601805).
关键词
聚类
采样
集成学习
脱硫系统
样本优选
clustering
sampling
ensemble learning
desulfurization system
preferred sample selection