摘要
质子交换膜燃料电池(PEMFC)是一种难以精确建模的非线性系统,因此需要具有较强鲁棒性与高适应性的控制器来控制PEMFC电堆温度。该文提出一种基于深度强化学习的数据驱动控制器来控制电堆温度。考虑PEMFC系统的特点,包括其非线性、不确定性和环境条件的影响,提出一种新的深度强化学习算法,即分类回放双延迟贝叶斯深度确定性策略梯度(CTDB-DDPG)算法。该算法的设计引入贝叶斯神经网络、分类经验回放等技术,提高了控制器的性能。通过仿真结果与RT-Lab实验平台的结果表明,利用CTDB-DDPG算法的高适应性与强鲁棒性,所提算法可以更有效地控制PEMFC电堆温度,具有一定的实际意义。
Proton exchange membrane fuel cells(PEMFCs)have the characteristics of difficulty to model accurately and strong nonlinearity;in addition,the radiator and circulating water pump in the hydrothermal management system of the fuel cell system have the characteristics of strong coupling,which makes it difficult for the model-based control algorithms to achieve accurate control of the fuel cell temperature,this paper proposes a data-driven model-free algorithm based on the on classified replay twin delayed Bayesian deep deterministic policy gradient(CTDB-DDPG)to achieve the control of the fuel cell temperature system.Firstly,the use of deep deterministic policy gradient is proposed to solve the problem of intricate modeling of fuel cells.Then,the classification experience playback strategy is added to the algorithm,and the CTDB-DDPG algorithm uses two experience buffer pools to store the experience data.When constructing the network model,the average TD error of all samples in these two experience buffer pools is initialized to 0.Whenever new experience data is generated,the average TD errors of all experience data are first updated.If its TD error exceeds the mean value,it is stored in the empirical buffer pool I.Otherwise,it is stored in the empirical buffer pool II.Classifying each experience sample's TD error helps better use the empirical data to train the network model.CTDB-DDPG considers the neural network's uncertainty by incorporating a Bayesian neural network into the algorithm,and the proposed Bootstrap with random initialization leads to a reasonable uncertainty estimation.At the beginning of each round or fixed interval during the learning process,unbiased hypotheses are obtained from the posterior distributions of the MDP parameters and estimated using a multi-head shared network Bootstrap value function,which does not require additional computational resources.Moreover,using Q-learning preserves the uncertainty of the cumulative discount,which is more effective for environments requiring deep exploration.Randomly selecting the head network and simulating Thompson sampling can effectively avoid ineffective boosting of intelligence in the noise strategy,accelerating the convergence of the CTDB-DDPG algorithm.In addition,the fuel cell thermal management system has a large inertia;the algorithm in this paper adds OU noise to the action to improve the exploration efficiency.OU noise is a temporary correlation noise extracted from the Ornstein-Uhlenbeck process,which helps the algorithm to better explore different strategies by generating temporal correlation noise.This exploration process can help the algorithm to find possible better strategies,thus improving the performance and efficiency of the algorithm.Although the addition of noise can cause the algorithm's performance to deteriorate in the short term,in the long term,the addition of noise can help the algorithm to avoid falling into a local optimum.It may help to find a better strategy.Finally,the algorithm's validity is verified on the simulation platform Simulink as well as the experimental platform RT-Lab,and similar conclusions are obtained,verifying the algorithm's effectiveness.However,although our CTDB-DDPG temperature control strategy has been validated on simulation and hardware-in-the-loop test platforms,more complex real-world working conditions,such as ambient temperature and humidity variations and equipment aging,will be considered in future studies to test and improve the adaptability and robustness of our algorithm in the broader range of more complex situations.
作者
赵洪山
潘思潮
马利波
吴雨晨
吕廷彦
Zhao Hongshan;PanSichao;Ma Libo;Wu Yuchen;LüTingyan(Key Laboratory of Distributed Energy Storage and Microgrid of Hebei Province North China Electric Power University,Baoding 071003,China)
出处
《电工技术学报》
EI
CSCD
北大核心
2024年第13期4240-4256,共17页
Transactions of China Electrotechnical Society
关键词
燃料电池
联合控制
深度确定性
贝叶斯网络
Fuel cell
joint control
deep reinforcement learning
Bayesian network