The complexity and uncertainty in power systems cause great challenges to controlling power grids.As a popular data-driven technique,deep reinforcement learning(DRL)attracts attention in the control of power grids.How...The complexity and uncertainty in power systems cause great challenges to controlling power grids.As a popular data-driven technique,deep reinforcement learning(DRL)attracts attention in the control of power grids.However,DRL has some inherent drawbacks in terms of data efficiency and explainability.This paper presents a novel hierarchical task planning(HTP)approach,bridging planning and DRL,to the task of power line flow regulation.First,we introduce a threelevel task hierarchy to model the task and model the sequence of task units on each level as a task planning-Markov decision processes(TP-MDPs).Second,we model the task as a sequential decision-making problem and introduce a higher planner and a lower planner in HTP to handle different levels of task units.In addition,we introduce a two-layer knowledge graph that can update dynamically during the planning procedure to assist HTP.Experimental results conducted on the IEEE 118-bus and IEEE 300-bus systems demonstrate our HTP approach outperforms proximal policy optimization,a state-of-the-art deep reinforcement learning(DRL)approach,improving efficiency by 26.16%and 6.86%on both systems.展开更多
基金supported in part by the National Key R&D Program(2018AAA0101501)of Chinathe science and technology project of SGCC(State Grid Corporation of China).
文摘The complexity and uncertainty in power systems cause great challenges to controlling power grids.As a popular data-driven technique,deep reinforcement learning(DRL)attracts attention in the control of power grids.However,DRL has some inherent drawbacks in terms of data efficiency and explainability.This paper presents a novel hierarchical task planning(HTP)approach,bridging planning and DRL,to the task of power line flow regulation.First,we introduce a threelevel task hierarchy to model the task and model the sequence of task units on each level as a task planning-Markov decision processes(TP-MDPs).Second,we model the task as a sequential decision-making problem and introduce a higher planner and a lower planner in HTP to handle different levels of task units.In addition,we introduce a two-layer knowledge graph that can update dynamically during the planning procedure to assist HTP.Experimental results conducted on the IEEE 118-bus and IEEE 300-bus systems demonstrate our HTP approach outperforms proximal policy optimization,a state-of-the-art deep reinforcement learning(DRL)approach,improving efficiency by 26.16%and 6.86%on both systems.