Driven by the improvement of the smart grid,the active distribution network(ADN)has attracted much attention due to its characteristic of active management.By making full use of electricity price signals for optimal s...Driven by the improvement of the smart grid,the active distribution network(ADN)has attracted much attention due to its characteristic of active management.By making full use of electricity price signals for optimal scheduling,the total cost of the ADN can be reduced.However,the optimal dayahead scheduling problem is challenging since the future electricity price is unknown.Moreover,in ADN,some schedulable variables are continuous while some schedulable variables are discrete,which increases the difficulty of determining the optimal scheduling scheme.In this paper,the day-ahead scheduling problem of the ADN is formulated as a Markov decision process(MDP)with continuous-discrete hybrid action space.Then,an algorithm based on multi-agent hybrid reinforcement learning(HRL)is proposed to obtain the optimal scheduling scheme.The proposed algorithm adopts the structure of centralized training and decentralized execution,and different methods are applied to determine the selection policy of continuous scheduling variables and discrete scheduling variables.The simulation experiment results demonstrate the effectiveness of the algorithm.展开更多
In this paper,the optimal control of non-linear switching system is investigated without knowing the system dynamics.First,the Hamilton-Jacobi-Bellman(HJB)equation is derived with the consideration of hybrid action sp...In this paper,the optimal control of non-linear switching system is investigated without knowing the system dynamics.First,the Hamilton-Jacobi-Bellman(HJB)equation is derived with the consideration of hybrid action space.Then,a novel data-based hybrid Q-learning(HQL)algorithm is proposed to find the optimal solution in an iterative manner.In addition,the theoretical analysis is provided to illustrate the convergence and optimality of the proposed algorithm.Finally,the algorithm is implemented with the actor-critic(AC)structure,and two linear-in-parameter neural networks are utilized to approximate the functions.Simulation results validate the effectiveness of the data-driven method.展开更多
The performance of the state-of-the-art Deep Reinforcement algorithms such as Proximal Policy Optimization, Twin Delayed Deep Deterministic Policy Gradient, and Soft Actor-Critic for generating a quadruped walking gai...The performance of the state-of-the-art Deep Reinforcement algorithms such as Proximal Policy Optimization, Twin Delayed Deep Deterministic Policy Gradient, and Soft Actor-Critic for generating a quadruped walking gait in a virtual environment was presented in previous research work titled “A Comparison of PPO, TD3, and SAC Reinforcement Algorithms for Quadruped Walking Gait Generation”. We demonstrated that the Soft Actor-Critic Reinforcement algorithm had the best performance generating the walking gait for a quadruped in certain instances of sensor configurations in the virtual environment. In this work, we present the performance analysis of the state-of-the-art Deep Reinforcement algorithms above for quadruped walking gait generation in a physical environment. The performance is determined in the physical environment by transfer learning augmented by real-time reinforcement learning for gait generation on a physical quadruped. The performance is analyzed on a quadruped equipped with a range of sensors such as position tracking using a stereo camera, contact sensing of each of the robot legs through force resistive sensors, and proprioceptive information of the robot body and legs using nine inertial measurement units. The performance comparison is presented using the metrics associated with the walking gait: average forward velocity (m/s), average forward velocity variance, average lateral velocity (m/s), average lateral velocity variance, and quaternion root mean square deviation. The strengths and weaknesses of each algorithm for the given task on the physical quadruped are discussed.展开更多
This paper develops deep reinforcement learning(DRL)algorithms for optimizing the operation of home energy system which consists of photovoltaic(PV)panels,battery energy storage system,and household appliances.Model-f...This paper develops deep reinforcement learning(DRL)algorithms for optimizing the operation of home energy system which consists of photovoltaic(PV)panels,battery energy storage system,and household appliances.Model-free DRL algorithms can efficiently handle the difficulty of energy system modeling and uncertainty of PV generation.However,discretecontinuous hybrid action space of the considered home energy system challenges existing DRL algorithms for either discrete actions or continuous actions.Thus,a mixed deep reinforcement learning(MDRL)algorithm is proposed,which integrates deep Q-learning(DQL)algorithm and deep deterministic policy gradient(DDPG)algorithm.The DQL algorithm deals with discrete actions,while the DDPG algorithm handles continuous actions.The MDRL algorithm learns optimal strategy by trialand-error interactions with the environment.However,unsafe actions,which violate system constraints,can give rise to great cost.To handle such problem,a safe-MDRL algorithm is further proposed.Simulation studies demonstrate that the proposed MDRL algorithm can efficiently handle the challenge from discrete-continuous hybrid action space for home energy management.The proposed MDRL algorithm reduces the operation cost while maintaining the human thermal comfort by comparing with benchmark algorithms on the test dataset.Moreover,the safe-MDRL algorithm greatly reduces the loss of thermal comfort in the learning stage by the proposed MDRL algorithm.展开更多
Purpose–This study aims to propose an enhanced eco-driving strategy based on reinforcement learning(RL)to alleviate the mileage anxiety of electric vehicles(EVs)in the connected environment.Design/methodology/approac...Purpose–This study aims to propose an enhanced eco-driving strategy based on reinforcement learning(RL)to alleviate the mileage anxiety of electric vehicles(EVs)in the connected environment.Design/methodology/approach–In this paper,an enhanced eco-driving control strategy based on an advanced RL algorithm in hybrid action space(EEDC-HRL)is proposed for connected EVs.The EEDC-HRL simultaneously controls longitudinal velocity and lateral lane-changing maneuvers to achieve more potential eco-driving.Moreover,this study redesigns an all-purpose and efficient-training reward function with the aim to achieve energy-saving on the premise of ensuring other driving performance.Findings–To illustrate the performance for the EEDC-HRL,the controlled EV was trained and tested in various traffic flow states.The experimental results demonstrate that the proposed technique can effectively improve energy efficiency,without sacrificing travel efficiency,comfort,safety and lane-changing performance in different traffic flow states.Originality/value–In light of the aforementioned discussion,the contributions of this paper are two-fold.An enhanced eco-driving strategy based an advanced RL algorithm in hybrid action space(EEDC-HRL)is proposed to jointly optimize longitudinal velocity and lateral lane-changing for connected EVs.A full-scale reward function consisting of multiple sub-rewards with a safety control constraint is redesigned to achieve eco-driving while ensuring other driving performance.展开更多
基金This work was supported by the National Key R&D Program of China(2018AAA0101400)the National Natural Science Foundation of China(62173251,61921004,U1713209)the Natural Science Foundation of Jiangsu Province of China(BK20202006).
文摘Driven by the improvement of the smart grid,the active distribution network(ADN)has attracted much attention due to its characteristic of active management.By making full use of electricity price signals for optimal scheduling,the total cost of the ADN can be reduced.However,the optimal dayahead scheduling problem is challenging since the future electricity price is unknown.Moreover,in ADN,some schedulable variables are continuous while some schedulable variables are discrete,which increases the difficulty of determining the optimal scheduling scheme.In this paper,the day-ahead scheduling problem of the ADN is formulated as a Markov decision process(MDP)with continuous-discrete hybrid action space.Then,an algorithm based on multi-agent hybrid reinforcement learning(HRL)is proposed to obtain the optimal scheduling scheme.The proposed algorithm adopts the structure of centralized training and decentralized execution,and different methods are applied to determine the selection policy of continuous scheduling variables and discrete scheduling variables.The simulation experiment results demonstrate the effectiveness of the algorithm.
基金supported by the National Key R&D Program of China(2018AAA0101400)the Natural Science Foundation of Jiangsu Province of China(BK20202006)the National Natural Science Foundation of China(61921004,62173251).
文摘In this paper,the optimal control of non-linear switching system is investigated without knowing the system dynamics.First,the Hamilton-Jacobi-Bellman(HJB)equation is derived with the consideration of hybrid action space.Then,a novel data-based hybrid Q-learning(HQL)algorithm is proposed to find the optimal solution in an iterative manner.In addition,the theoretical analysis is provided to illustrate the convergence and optimality of the proposed algorithm.Finally,the algorithm is implemented with the actor-critic(AC)structure,and two linear-in-parameter neural networks are utilized to approximate the functions.Simulation results validate the effectiveness of the data-driven method.
文摘The performance of the state-of-the-art Deep Reinforcement algorithms such as Proximal Policy Optimization, Twin Delayed Deep Deterministic Policy Gradient, and Soft Actor-Critic for generating a quadruped walking gait in a virtual environment was presented in previous research work titled “A Comparison of PPO, TD3, and SAC Reinforcement Algorithms for Quadruped Walking Gait Generation”. We demonstrated that the Soft Actor-Critic Reinforcement algorithm had the best performance generating the walking gait for a quadruped in certain instances of sensor configurations in the virtual environment. In this work, we present the performance analysis of the state-of-the-art Deep Reinforcement algorithms above for quadruped walking gait generation in a physical environment. The performance is determined in the physical environment by transfer learning augmented by real-time reinforcement learning for gait generation on a physical quadruped. The performance is analyzed on a quadruped equipped with a range of sensors such as position tracking using a stereo camera, contact sensing of each of the robot legs through force resistive sensors, and proprioceptive information of the robot body and legs using nine inertial measurement units. The performance comparison is presented using the metrics associated with the walking gait: average forward velocity (m/s), average forward velocity variance, average lateral velocity (m/s), average lateral velocity variance, and quaternion root mean square deviation. The strengths and weaknesses of each algorithm for the given task on the physical quadruped are discussed.
基金supported by the National Natural Science Foundation of China(No.62002016)the Science and Technology Development Fund,Macao S.A.R.(No.0137/2019/A3)+1 种基金the Beijing Natural Science Foundation(No.9204028)the Guangdong Basic and Applied Basic Research Foundation(No.2019A1515111165)。
文摘This paper develops deep reinforcement learning(DRL)algorithms for optimizing the operation of home energy system which consists of photovoltaic(PV)panels,battery energy storage system,and household appliances.Model-free DRL algorithms can efficiently handle the difficulty of energy system modeling and uncertainty of PV generation.However,discretecontinuous hybrid action space of the considered home energy system challenges existing DRL algorithms for either discrete actions or continuous actions.Thus,a mixed deep reinforcement learning(MDRL)algorithm is proposed,which integrates deep Q-learning(DQL)algorithm and deep deterministic policy gradient(DDPG)algorithm.The DQL algorithm deals with discrete actions,while the DDPG algorithm handles continuous actions.The MDRL algorithm learns optimal strategy by trialand-error interactions with the environment.However,unsafe actions,which violate system constraints,can give rise to great cost.To handle such problem,a safe-MDRL algorithm is further proposed.Simulation studies demonstrate that the proposed MDRL algorithm can efficiently handle the challenge from discrete-continuous hybrid action space for home energy management.The proposed MDRL algorithm reduces the operation cost while maintaining the human thermal comfort by comparing with benchmark algorithms on the test dataset.Moreover,the safe-MDRL algorithm greatly reduces the loss of thermal comfort in the learning stage by the proposed MDRL algorithm.
基金China Automobile Industry Innovation and Development Joint Fund(U1864206).
文摘Purpose–This study aims to propose an enhanced eco-driving strategy based on reinforcement learning(RL)to alleviate the mileage anxiety of electric vehicles(EVs)in the connected environment.Design/methodology/approach–In this paper,an enhanced eco-driving control strategy based on an advanced RL algorithm in hybrid action space(EEDC-HRL)is proposed for connected EVs.The EEDC-HRL simultaneously controls longitudinal velocity and lateral lane-changing maneuvers to achieve more potential eco-driving.Moreover,this study redesigns an all-purpose and efficient-training reward function with the aim to achieve energy-saving on the premise of ensuring other driving performance.Findings–To illustrate the performance for the EEDC-HRL,the controlled EV was trained and tested in various traffic flow states.The experimental results demonstrate that the proposed technique can effectively improve energy efficiency,without sacrificing travel efficiency,comfort,safety and lane-changing performance in different traffic flow states.Originality/value–In light of the aforementioned discussion,the contributions of this paper are two-fold.An enhanced eco-driving strategy based an advanced RL algorithm in hybrid action space(EEDC-HRL)is proposed to jointly optimize longitudinal velocity and lateral lane-changing for connected EVs.A full-scale reward function consisting of multiple sub-rewards with a safety control constraint is redesigned to achieve eco-driving while ensuring other driving performance.