摘要
针对风光储联合系统的调度问题,提出了一种基于深度强化学习的风光储系统联合调度模型。首先,以计划跟踪、弃风弃光以及储能运行成本最小为目标,建立了充分考虑风光储各个场站约束下的联合调度模型。然后,定义该调度模型在强化学习框架下的系统状态变量、动作变量以及奖励函数等,引入了深度确定性策略梯度算法,利用其环境交互、策略探索的机制,学习风光储系统的联合调度策略,以实现对联合系统功率跟踪,减少弃风弃光以及储能充放电。最后,借用西北某地区风电、光伏、跟踪计划的历史数据对模型进行了训练和算例分析,结果表明所提方法可以较好地适应不同时期的风光变化,得到在给定风光下联合系统的调度策略。
A deep reinforcement learning based wind-photovoltaic-storage system joint dispatch model is proposed.First,a joint dispatch model that fully considers the constraints of various wind and solar storage stations is established,where tracking dispatch plans,wind and solar curtailment,and energy storage operation costs are considered in the objective function.Then,the state variables,action variables and reward function under the reinforcement learning framework are defined.Later,a deep deterministic policy gradient algorithm is introduced,using its environmental interaction and strategy exploration mechanism to learn the joint scheduling strategy,so as to achieve the dispatch strategy tracking,reduce wind and solar abandonment,and reduce energy storage charging and discharging.Finally,the historical data of wind power,photovoltaic,and dispatch plan in a certain area of northwestern China are employed to train and analyze the model.The results of the case studies show that the proposed method can adapt well to the changes in the wind power and photovoltaic power in different periods,and the joint scheduling strategy can be obtained under given data of wind and photovoltaic.
作者
张淑兴
马驰
杨志学
王尧
吴昊
任洲洋
ZHANG Shuxing;MA Chi;YANG Zhixue;WANG Yao;WU Hao;REN Zhouyang(China Nuclear Power Technology Research Institute Co.,Ltd.,Shenzhen 518000,China;CGN New Energy Holdings Co.,Ltd.,Beijing 100084,China;State Key Laboratory of Power Transmission Equipment&System Security and New Technology(Chongqing University),Chongqing 400044,China)
出处
《中国电力》
CSCD
北大核心
2023年第2期68-76,共9页
Electric Power
基金
国家自然科学基金资助项目(51677012)。
关键词
风光储联合系统
联合调度策略
不确定性
深度强化学习
深度确定性策略梯度算法
wind-photovoltaic-storage hybrid system
joint scheduling strategy
uncertainty
deep reinforcement learning
deep deterministic policy gradient algorithm