This paper describes a real-time beam tuning method with an improved asynchronous advantage actor–critic(A3C)algorithm for accelerator systems.The operating parameters of devices are usually inconsistent with the pre...This paper describes a real-time beam tuning method with an improved asynchronous advantage actor–critic(A3C)algorithm for accelerator systems.The operating parameters of devices are usually inconsistent with the predictions of physical designs because of errors in mechanical matching and installation.Therefore,parameter optimization methods such as pointwise scanning,evolutionary algorithms(EAs),and robust conjugate direction search are widely used in beam tuning to compensate for this inconsistency.However,it is difficult for them to deal with a large number of discrete local optima.The A3C algorithm,which has been applied in the automated control field,provides an approach for improving multi-dimensional optimization.The A3C algorithm is introduced and improved for the real-time beam tuning code for accelerators.Experiments in which optimization is achieved by using pointwise scanning,the genetic algorithm(one kind of EAs),and the A3C-algorithm are conducted and compared to optimize the currents of four steering magnets and two solenoids in the low-energy beam transport section(LEBT)of the Xi’an Proton Application Facility.Optimal currents are determined when the highest transmission of a radio frequency quadrupole(RFQ)accelerator downstream of the LEBT is achieved.The optimal work points of the tuned accelerator were obtained with currents of 0 A,0 A,0 A,and 0.1 A,for the four steering magnets,and 107 A and 96 A for the two solenoids.Furthermore,the highest transmission of the RFQ was 91.2%.Meanwhile,the lower time required for the optimization with the A3C algorithm was successfully verified.Optimization with the A3C algorithm consumed 42%and 78%less time than pointwise scanning with random initialization and pre-trained initialization of weights,respectively.展开更多
为解决由于固定温度SAC(Soft Actor Critic)算法中存在的Q函数高估可能会导致算法陷入局部最优的问题,通过深入分析提出了一个稳定且受限的SAC算法(SCSAC:Stable Constrained Soft Actor Critic)。该算法通过改进最大熵目标函数修复固...为解决由于固定温度SAC(Soft Actor Critic)算法中存在的Q函数高估可能会导致算法陷入局部最优的问题,通过深入分析提出了一个稳定且受限的SAC算法(SCSAC:Stable Constrained Soft Actor Critic)。该算法通过改进最大熵目标函数修复固定温度SAC算法中的Q函数高估问题,同时增强算法在测试过程中稳定性的效果。最后,在4个OpenAI Gym Mujoco环境下对SCSAC算法进行了验证,实验结果表明,稳定且受限的SAC算法相比固定温度SAC算法可以有效减小Q函数高估出现的次数并能在测试中获得更加稳定的结果。展开更多
虚拟电厂(virtual power plant,VPP)作为多能流互联的综合能源网络,已成为中国加速实现双碳目标的重要角色。但VPP内部资源协同低碳调度面临多能流的耦合程度紧密、传统碳交易模型参数主观性强、含高维动态参数的优化目标在线求解困难...虚拟电厂(virtual power plant,VPP)作为多能流互联的综合能源网络,已成为中国加速实现双碳目标的重要角色。但VPP内部资源协同低碳调度面临多能流的耦合程度紧密、传统碳交易模型参数主观性强、含高维动态参数的优化目标在线求解困难等问题。针对这些问题,文中提出一种融合注意力机制(attention mechanism,AM)与柔性动作评价(soft actor-critic,SAC)算法的VPP多能流低碳调度方法。首先,根据VPP的随机碳流特性,面向动态参数建立基于贝叶斯优化的改进阶梯型碳交易机制。接着,以经济效益和碳排放量为目标函数构建含氢VPP多能流解耦模型。然后,考虑到该模型具有高维非线性与权重参数实时更新的特征,利用融合AM的改进SAC深度强化学习算法在连续动作空间对模型进行求解。最后,对多能流调度结果进行仿真分析和对比实验,验证了文中方法的可行性及其相较于原SAC算法较高的决策准确性。展开更多
文摘This paper describes a real-time beam tuning method with an improved asynchronous advantage actor–critic(A3C)algorithm for accelerator systems.The operating parameters of devices are usually inconsistent with the predictions of physical designs because of errors in mechanical matching and installation.Therefore,parameter optimization methods such as pointwise scanning,evolutionary algorithms(EAs),and robust conjugate direction search are widely used in beam tuning to compensate for this inconsistency.However,it is difficult for them to deal with a large number of discrete local optima.The A3C algorithm,which has been applied in the automated control field,provides an approach for improving multi-dimensional optimization.The A3C algorithm is introduced and improved for the real-time beam tuning code for accelerators.Experiments in which optimization is achieved by using pointwise scanning,the genetic algorithm(one kind of EAs),and the A3C-algorithm are conducted and compared to optimize the currents of four steering magnets and two solenoids in the low-energy beam transport section(LEBT)of the Xi’an Proton Application Facility.Optimal currents are determined when the highest transmission of a radio frequency quadrupole(RFQ)accelerator downstream of the LEBT is achieved.The optimal work points of the tuned accelerator were obtained with currents of 0 A,0 A,0 A,and 0.1 A,for the four steering magnets,and 107 A and 96 A for the two solenoids.Furthermore,the highest transmission of the RFQ was 91.2%.Meanwhile,the lower time required for the optimization with the A3C algorithm was successfully verified.Optimization with the A3C algorithm consumed 42%and 78%less time than pointwise scanning with random initialization and pre-trained initialization of weights,respectively.
文摘大规模阵列天线技术(Massive Multiple Input Multiple Output,Massive MIMO)作为第五代移动通信(5G)的无线核心技术,实现了多波束空间覆盖增强,然而5G Massive MIMO的多波束射频高能耗、多波束碰撞和增加的干扰造会成5G网络能效下降,运营成本增高。基于3D数字地图、基站工程参数、终端上报的测量报告/最小化路测(Measurement Report/Minimization of Drive Test,MR/MDT)数据、用户/业务分布构建的三维数字孪生栅格,通过卷积长短期记忆(Convolutional Long Short Term Memory,Conv-LSTM)算法对栅格内的用户分布、业务分布进行分析和预测,通过Actor-Critic架构对5G波束配置和优化策略进行评估,实现不同场景、时段的5G波束最佳能效,智能适应5G网络潮汐效应,实现“网随业动”。
文摘为解决由于固定温度SAC(Soft Actor Critic)算法中存在的Q函数高估可能会导致算法陷入局部最优的问题,通过深入分析提出了一个稳定且受限的SAC算法(SCSAC:Stable Constrained Soft Actor Critic)。该算法通过改进最大熵目标函数修复固定温度SAC算法中的Q函数高估问题,同时增强算法在测试过程中稳定性的效果。最后,在4个OpenAI Gym Mujoco环境下对SCSAC算法进行了验证,实验结果表明,稳定且受限的SAC算法相比固定温度SAC算法可以有效减小Q函数高估出现的次数并能在测试中获得更加稳定的结果。
文摘虚拟电厂(virtual power plant,VPP)作为多能流互联的综合能源网络,已成为中国加速实现双碳目标的重要角色。但VPP内部资源协同低碳调度面临多能流的耦合程度紧密、传统碳交易模型参数主观性强、含高维动态参数的优化目标在线求解困难等问题。针对这些问题,文中提出一种融合注意力机制(attention mechanism,AM)与柔性动作评价(soft actor-critic,SAC)算法的VPP多能流低碳调度方法。首先,根据VPP的随机碳流特性,面向动态参数建立基于贝叶斯优化的改进阶梯型碳交易机制。接着,以经济效益和碳排放量为目标函数构建含氢VPP多能流解耦模型。然后,考虑到该模型具有高维非线性与权重参数实时更新的特征,利用融合AM的改进SAC深度强化学习算法在连续动作空间对模型进行求解。最后,对多能流调度结果进行仿真分析和对比实验,验证了文中方法的可行性及其相较于原SAC算法较高的决策准确性。