由于多智能体所处环境动态变化,并且单个智能体的决策也会影响其他智能体,这使得单智能体深度强化学习算法难以在多智能体环境中保持稳定.为了适应多智能体环境,本文利用集中训练和分散执行框架Cen-tralized Training with Decentralize...由于多智能体所处环境动态变化,并且单个智能体的决策也会影响其他智能体,这使得单智能体深度强化学习算法难以在多智能体环境中保持稳定.为了适应多智能体环境,本文利用集中训练和分散执行框架Cen-tralized Training with Decentralized Execution(CTDE),对单智能体深度强化学习算法Soft Actor-Critic(SAC)进行了改进,引入智能体通信机制,构建Multi-Agent Soft Actor-Critic(MASAC)算法. MASAC中智能体共享观察信息和历史经验,有效减少了环境不稳定性对算法造成的影响.最后,本文在协同以及协同竞争混合的任务中,对MASAC算法性能进行了实验分析,结果表明MASAC相对于SAC在多智能体环境中具有更好的稳定性.展开更多
The smart grid utilizes the demand side management technology to motivate energy users towards cutting demand during peak power consumption periods, which greatly improves the operation efficiency of the power grid. H...The smart grid utilizes the demand side management technology to motivate energy users towards cutting demand during peak power consumption periods, which greatly improves the operation efficiency of the power grid. However, as the number of energy users participating in the smart grid continues to increase, the demand side management strategy of individual agent is greatly affected by the dynamic strategies of other agents. In addition, the existing demand side management methods, which need to obtain users’ power consumption information,seriously threaten the users’ privacy. To address the dynamic issue in the multi-microgrid demand side management model, a novel multi-agent reinforcement learning method based on centralized training and decentralized execution paradigm is presented to mitigate the damage of training performance caused by the instability of training experience. In order to protect users’ privacy, we design a neural network with fixed parameters as the encryptor to transform the users’ energy consumption information from low-dimensional to high-dimensional and theoretically prove that the proposed encryptor-based privacy preserving method will not affect the convergence property of the reinforcement learning algorithm. We verify the effectiveness of the proposed demand side management scheme with the real-world energy consumption data of Xi’an, Shaanxi, China. Simulation results show that the proposed method can effectively improve users’ satisfaction while reducing the bill payment compared with traditional reinforcement learning(RL) methods(i.e., deep Q learning(DQN), deep deterministic policy gradient(DDPG),QMIX and multi-agent deep deterministic policy gradient(MADDPG)). The results also demonstrate that the proposed privacy protection scheme can effectively protect users’ privacy while ensuring the performance of the algorithm.展开更多
为提高综合能源系统自动发电控制(Automatic Generation Control,AGC)的控制性能和算法收敛速度,本文提出了一种基于多智能体迁移柔性行动器-批判器与长短时记忆网络(Multi-Agent Transfer Soft Actor-Critic with Long-Short Term Memo...为提高综合能源系统自动发电控制(Automatic Generation Control,AGC)的控制性能和算法收敛速度,本文提出了一种基于多智能体迁移柔性行动器-批判器与长短时记忆网络(Multi-Agent Transfer Soft Actor-Critic with Long-Short Term Memory,MATSAC-LSTM)的AGC控制法。首先,用LSTM网络将采集的区域控制误差等环境状态量进行时序特征提取,并作为MATSAC算法的输入,使智能体能结合历史信息进行快速的有功功率分配决策;其次,采用集中训练分散执行框架,将一个智能体观察的环境状态量以及其他智能体的动作信息作为相应智能体Critic网络的输入,以便训练时能够让多智能体之间共享信息;最后,通过迁移学习将旧任务训练的Critic和Actor网络模型参数转移到新任务相应模型参数中,以提高智能体的训练效率。算例分析在一个修改的IEEE标准两区域负荷频率控制系统模型和一个五区域综合能源系统模型展开,仿真结果表明,与比例积分微分、Q学习、双延迟深度确定性策略梯度、基于动态策略的赢或快速学习爬坡策略、柔性行动器-批判器等传统算法相比,本文所提MATSAC-LSTM算法提高了AGC控制性能标准和算法收敛速度,降低了系统的区域控制误差和频率偏差。展开更多
文摘由于多智能体所处环境动态变化,并且单个智能体的决策也会影响其他智能体,这使得单智能体深度强化学习算法难以在多智能体环境中保持稳定.为了适应多智能体环境,本文利用集中训练和分散执行框架Cen-tralized Training with Decentralized Execution(CTDE),对单智能体深度强化学习算法Soft Actor-Critic(SAC)进行了改进,引入智能体通信机制,构建Multi-Agent Soft Actor-Critic(MASAC)算法. MASAC中智能体共享观察信息和历史经验,有效减少了环境不稳定性对算法造成的影响.最后,本文在协同以及协同竞争混合的任务中,对MASAC算法性能进行了实验分析,结果表明MASAC相对于SAC在多智能体环境中具有更好的稳定性.
基金supported in part by the National Science Foundation of China (61973247, 61673315, 62173268)the Key Research and Development Program of Shaanxi (2022GY-033)+2 种基金the Nationa Postdoctoral Innovative Talents Support Program of China (BX20200272)the Key Program of the National Natural Science Foundation of China (61833015)the Fundamental Research Funds for the Central Universities (xzy022021050)。
文摘The smart grid utilizes the demand side management technology to motivate energy users towards cutting demand during peak power consumption periods, which greatly improves the operation efficiency of the power grid. However, as the number of energy users participating in the smart grid continues to increase, the demand side management strategy of individual agent is greatly affected by the dynamic strategies of other agents. In addition, the existing demand side management methods, which need to obtain users’ power consumption information,seriously threaten the users’ privacy. To address the dynamic issue in the multi-microgrid demand side management model, a novel multi-agent reinforcement learning method based on centralized training and decentralized execution paradigm is presented to mitigate the damage of training performance caused by the instability of training experience. In order to protect users’ privacy, we design a neural network with fixed parameters as the encryptor to transform the users’ energy consumption information from low-dimensional to high-dimensional and theoretically prove that the proposed encryptor-based privacy preserving method will not affect the convergence property of the reinforcement learning algorithm. We verify the effectiveness of the proposed demand side management scheme with the real-world energy consumption data of Xi’an, Shaanxi, China. Simulation results show that the proposed method can effectively improve users’ satisfaction while reducing the bill payment compared with traditional reinforcement learning(RL) methods(i.e., deep Q learning(DQN), deep deterministic policy gradient(DDPG),QMIX and multi-agent deep deterministic policy gradient(MADDPG)). The results also demonstrate that the proposed privacy protection scheme can effectively protect users’ privacy while ensuring the performance of the algorithm.
文摘为提高综合能源系统自动发电控制(Automatic Generation Control,AGC)的控制性能和算法收敛速度,本文提出了一种基于多智能体迁移柔性行动器-批判器与长短时记忆网络(Multi-Agent Transfer Soft Actor-Critic with Long-Short Term Memory,MATSAC-LSTM)的AGC控制法。首先,用LSTM网络将采集的区域控制误差等环境状态量进行时序特征提取,并作为MATSAC算法的输入,使智能体能结合历史信息进行快速的有功功率分配决策;其次,采用集中训练分散执行框架,将一个智能体观察的环境状态量以及其他智能体的动作信息作为相应智能体Critic网络的输入,以便训练时能够让多智能体之间共享信息;最后,通过迁移学习将旧任务训练的Critic和Actor网络模型参数转移到新任务相应模型参数中,以提高智能体的训练效率。算例分析在一个修改的IEEE标准两区域负荷频率控制系统模型和一个五区域综合能源系统模型展开,仿真结果表明,与比例积分微分、Q学习、双延迟深度确定性策略梯度、基于动态策略的赢或快速学习爬坡策略、柔性行动器-批判器等传统算法相比,本文所提MATSAC-LSTM算法提高了AGC控制性能标准和算法收敛速度,降低了系统的区域控制误差和频率偏差。