The P-type update law has been the mainstream technique used in iterative learning control(ILC)systems,which resembles linear feedback control with asymptotical convergence.In recent years,finite-time control strategi...The P-type update law has been the mainstream technique used in iterative learning control(ILC)systems,which resembles linear feedback control with asymptotical convergence.In recent years,finite-time control strategies such as terminal sliding mode control have been shown to be effective in ramping up convergence speed by introducing fractional power with feedback.In this paper,we show that such mechanism can equally ramp up the learning speed in ILC systems.We first propose a fractional power update rule for ILC of single-input-single-output linear systems.A nonlinear error dynamics is constructed along the iteration axis to illustrate the evolutionary converging process.Using the nonlinear mapping approach,fast convergence towards the limit cycles of tracking errors inherently existing in ILC systems is proven.The limit cycles are shown to be tunable to determine the steady states.Numerical simulations are provided to verify the theoretical results.展开更多
As a media learning platform,the"Learning Power55 platform integrates the advantages of the internet,big data,and new media.Through the supply of massive explicit and implicit learning resources as well as the co...As a media learning platform,the"Learning Power55 platform integrates the advantages of the internet,big data,and new media.Through the supply of massive explicit and implicit learning resources as well as the construction of the interactive space of"Learning Power/5 it fully embodies the education mechanism of moral education.Specifically,it is reflected in the distinctive political position and the education goal mechanism of"moral education,55 the education operation mechanism of"explicit and implicit unity,"the learning mechanism of'"autonomy and cooperation integTation,"and the feedback incentive mechanism of"gamification."The organic combination and interactive operation of these four mechanisms form a collaborative education mechanism system of goal orientation,education operation,learning process,and feedback incentive.展开更多
The operation of electricity grids has become increasingly complex due to the current upheaval and the increase in renewable energy production.As a consequence,active grid management is reaching its limits with conven...The operation of electricity grids has become increasingly complex due to the current upheaval and the increase in renewable energy production.As a consequence,active grid management is reaching its limits with conventional approaches.In the context of the Learning to Run a Power Network(L2RPN)challenge,it has been shown that Reinforcement Learning(RL)is an efficient and reliable approach with considerable potential for automatic grid operation.In this article,we analyse the submitted agent from Binbinchen and provide novel strategies to improve the agent,both for the RL and the rule-based approach.The main improvement is a N-1 strategy,where we consider topology actions that keep the grid stable,even if one line is disconnected.More,we also propose a topology reversion to the original grid,which proved to be beneficial.The improvements are tested against reference approaches on the challenge test sets and are able to increase the performance of the rule-based agent by 27%.In direct comparison between rule-based and RL agent we find similar performance.However,the RL agent has a clear computational advantage.We also analyse the behaviour in an exemplary case in more detail to provide additional insights.Here,we observe that through the N-1 strategy,the actions of both the rule-based and the RL agent become more diversified.展开更多
The complexity and uncertainty in power systems cause great challenges to controlling power grids.As a popular data-driven technique,deep reinforcement learning(DRL)attracts attention in the control of power grids.How...The complexity and uncertainty in power systems cause great challenges to controlling power grids.As a popular data-driven technique,deep reinforcement learning(DRL)attracts attention in the control of power grids.However,DRL has some inherent drawbacks in terms of data efficiency and explainability.This paper presents a novel hierarchical task planning(HTP)approach,bridging planning and DRL,to the task of power line flow regulation.First,we introduce a threelevel task hierarchy to model the task and model the sequence of task units on each level as a task planning-Markov decision processes(TP-MDPs).Second,we model the task as a sequential decision-making problem and introduce a higher planner and a lower planner in HTP to handle different levels of task units.In addition,we introduce a two-layer knowledge graph that can update dynamically during the planning procedure to assist HTP.Experimental results conducted on the IEEE 118-bus and IEEE 300-bus systems demonstrate our HTP approach outperforms proximal policy optimization,a state-of-the-art deep reinforcement learning(DRL)approach,improving efficiency by 26.16%and 6.86%on both systems.展开更多
Harnessing the quantum computation power of the present noisy-intermediate-size-quantum devices has received tremendous interest in the last few years. Here we study the learning power of a one-dimensional long-range ...Harnessing the quantum computation power of the present noisy-intermediate-size-quantum devices has received tremendous interest in the last few years. Here we study the learning power of a one-dimensional long-range randomly-coupled quantum spin chain, within the framework of reservoir computing. In time sequence learning tasks, we find the system in the quantum many-body localized (MBL) phase holds long-term memory, which can be attributed to the emergent local integrals of motion. On the other hand, MBL phase does not provide sufficient nonlinearity in learning highly-nonlinear time sequences, which we show in a parity check task. This is reversed in the quantum ergodic phase, which provides sufficient nonlinearity but compromises memory capacity. In a complex learning task of Mackey–Glass prediction that requires both sufficient memory capacity and nonlinearity, we find optimal learning performance near the MBL-to-ergodic transition. This leads to a guiding principle of quantum reservoir engineering at the edge of quantum ergodicity reaching optimal learning power for generic complex reservoir learning tasks. Our theoretical finding can be tested with near-term NISQ quantum devices.展开更多
基金supported by the National Natural Science Foundation of China(62173333)Australian Research Council Discovery Program(DP200101199)。
文摘The P-type update law has been the mainstream technique used in iterative learning control(ILC)systems,which resembles linear feedback control with asymptotical convergence.In recent years,finite-time control strategies such as terminal sliding mode control have been shown to be effective in ramping up convergence speed by introducing fractional power with feedback.In this paper,we show that such mechanism can equally ramp up the learning speed in ILC systems.We first propose a fractional power update rule for ILC of single-input-single-output linear systems.A nonlinear error dynamics is constructed along the iteration axis to illustrate the evolutionary converging process.Using the nonlinear mapping approach,fast convergence towards the limit cycles of tracking errors inherently existing in ILC systems is proven.The limit cycles are shown to be tunable to determine the steady states.Numerical simulations are provided to verify the theoretical results.
文摘As a media learning platform,the"Learning Power55 platform integrates the advantages of the internet,big data,and new media.Through the supply of massive explicit and implicit learning resources as well as the construction of the interactive space of"Learning Power/5 it fully embodies the education mechanism of moral education.Specifically,it is reflected in the distinctive political position and the education goal mechanism of"moral education,55 the education operation mechanism of"explicit and implicit unity,"the learning mechanism of'"autonomy and cooperation integTation,"and the feedback incentive mechanism of"gamification."The organic combination and interactive operation of these four mechanisms form a collaborative education mechanism system of goal orientation,education operation,learning process,and feedback incentive.
基金This work was supported by the Competence Centre for Cognitive Energy Systems of the Fraunhofer IEE and the research group Rein-forcement Learning for cognitive energy systems(RL4CES)from the Intelligent Embedded Systems of the University Kassel.
文摘The operation of electricity grids has become increasingly complex due to the current upheaval and the increase in renewable energy production.As a consequence,active grid management is reaching its limits with conventional approaches.In the context of the Learning to Run a Power Network(L2RPN)challenge,it has been shown that Reinforcement Learning(RL)is an efficient and reliable approach with considerable potential for automatic grid operation.In this article,we analyse the submitted agent from Binbinchen and provide novel strategies to improve the agent,both for the RL and the rule-based approach.The main improvement is a N-1 strategy,where we consider topology actions that keep the grid stable,even if one line is disconnected.More,we also propose a topology reversion to the original grid,which proved to be beneficial.The improvements are tested against reference approaches on the challenge test sets and are able to increase the performance of the rule-based agent by 27%.In direct comparison between rule-based and RL agent we find similar performance.However,the RL agent has a clear computational advantage.We also analyse the behaviour in an exemplary case in more detail to provide additional insights.Here,we observe that through the N-1 strategy,the actions of both the rule-based and the RL agent become more diversified.
基金supported in part by the National Key R&D Program(2018AAA0101501)of Chinathe science and technology project of SGCC(State Grid Corporation of China).
文摘The complexity and uncertainty in power systems cause great challenges to controlling power grids.As a popular data-driven technique,deep reinforcement learning(DRL)attracts attention in the control of power grids.However,DRL has some inherent drawbacks in terms of data efficiency and explainability.This paper presents a novel hierarchical task planning(HTP)approach,bridging planning and DRL,to the task of power line flow regulation.First,we introduce a threelevel task hierarchy to model the task and model the sequence of task units on each level as a task planning-Markov decision processes(TP-MDPs).Second,we model the task as a sequential decision-making problem and introduce a higher planner and a lower planner in HTP to handle different levels of task units.In addition,we introduce a two-layer knowledge graph that can update dynamically during the planning procedure to assist HTP.Experimental results conducted on the IEEE 118-bus and IEEE 300-bus systems demonstrate our HTP approach outperforms proximal policy optimization,a state-of-the-art deep reinforcement learning(DRL)approach,improving efficiency by 26.16%and 6.86%on both systems.
基金This work was supported by the National Program on Key Basic Research Project of China(Grant Nos.2021YFA1400900 and 2017YFA0304204)the National Natural Science Foundation of China(Grant Nos.11774067 and 11934002)+3 种基金Shanghai Municipal Science and Technology Major Project(Grant No.2019SHZDZX01)Shanghai Science Foundation(Grant No.19ZR1471500)the Open Project of Shenzhen Institute of Quantum Science and Engineering(Grant No.SIQSE202002)X.Q.acknowledges support from the National Postdoctoral Program for Innovative Talents of China under Grant No.BX20190083.
文摘Harnessing the quantum computation power of the present noisy-intermediate-size-quantum devices has received tremendous interest in the last few years. Here we study the learning power of a one-dimensional long-range randomly-coupled quantum spin chain, within the framework of reservoir computing. In time sequence learning tasks, we find the system in the quantum many-body localized (MBL) phase holds long-term memory, which can be attributed to the emergent local integrals of motion. On the other hand, MBL phase does not provide sufficient nonlinearity in learning highly-nonlinear time sequences, which we show in a parity check task. This is reversed in the quantum ergodic phase, which provides sufficient nonlinearity but compromises memory capacity. In a complex learning task of Mackey–Glass prediction that requires both sufficient memory capacity and nonlinearity, we find optimal learning performance near the MBL-to-ergodic transition. This leads to a guiding principle of quantum reservoir engineering at the edge of quantum ergodicity reaching optimal learning power for generic complex reservoir learning tasks. Our theoretical finding can be tested with near-term NISQ quantum devices.