摘要
针对选择性催化还原(SCR)脱硝系统大惯性、多扰动等特点,提出了一种基于多维状态信息和分段奖励函数优化的深度确定性策略梯度(DDPG)协同比例积分微分(PID)控制器的控制策略。针对SCR脱硝系统中存在部分可观测马尔可夫决策过程(POMDP),导致DDPG算法策略学习效率较低的问题,首先设计SCR脱硝系统的多维状态信息;其次,设计SCR脱硝系统的分段奖励函数;最后,设计DDPG-PID协同控制策略,以实现SCR脱硝系统的控制。结果表明:所设计的DDPG-PID协同控制策略提高了DDPG算法的策略学习效率,改善了PID的控制效果,同时具有较强的设定值跟踪能力、抗干扰能力和鲁棒性。
A cooperative control strategy of deep deterministic policy gradient(DDPG)and proportion integration differentiation(PID)based on multidimensional state information and segmental reward function optimization was proposed for the selective catalytic reduction(SCR)denitrification system with large inertia and multi-disturbance.Addressing the problem of low strategy learning efficiency of the DDPG algorithm caused by the partially observable Markov decision process(POMDP)in the SCR denitrification system,the multidimensional state information of the SCR denitrification system was designed firstly.Secondly,the segmented reward function of the SCR denitrification system was designed.Finally,a DDPG-PID cooperative control strategy was designed to achieve the control of SCR denitrification system.Results show that the designed DDPG-PID cooperative control strategy improves the strategy learning efficiency of the DDPG algorithm and the control effect of PID.Meanwhile,the designed cooperative control strategy has strong set value tracking capability,anti-interference capability and robustness.
作者
赵征
刘子涵
ZHAO Zheng;LIU Zihan(School of Control and Computer Engineering,North China Electric Power University,Baoding 071003,Hebei Province,China)
出处
《动力工程学报》
CAS
CSCD
北大核心
2024年第5期802-809,共8页
Journal of Chinese Society of Power Engineering
基金
深圳市科技计划资助项目(KCXFZ20201221173402007)。
关键词
DDPG
强化学习
SCR脱硝系统
协同控制
多维状态
分段奖励函数
DDPG
reinforcement learning
SCR denitrification system
cooperative control
multidimensional state
segmented reward function