期刊文献+

基于鲁棒交叉熵与梯度优化的安全强化学习方法

Safe Reinforcement Learning Method Based on Robust Cross-Entropy and Gradient Optimization
下载PDF
导出
摘要 智能体在复杂环境下执行任务时,如何保证安全性和效率性是一个很大的难题。传统强化学习方法解决智能体决策问题时采用无模型的强化学习,利用大量数据不断试错寻找最优策略,忽略了智能体的训练成本和安全风险,因此无法有效保证决策的安全性。为此,在模型预测控制框架下对智能体动作添加安全约束条件,设计安全强化学习算法获得最安全的动作控制序列。同时,针对交叉熵方法存在计算量大与效率低、梯度优化方法存在着陷入局部最优的问题,结合鲁棒交叉熵与梯度优化方法优化动作控制序列,以提升算法安全性和求解效率。实验表明,所提方法相较于鲁棒交叉熵法能有效提升收敛速度,相较于其他优化算法在不损失较多性能的前提下安全性能最优。 Ensuring security and efficiency when intelligent agents perform tasks in complex environments is a major challenge.Traditional reinforcement learning methods use model free reinforcement learning to solve intelligent decision-making problems,constantly trial and error to find the optimal strategy using a large amount of data,ignoring the training cost and security risks of the agent,and therefore cannot effectively ensure the safety of decision-making.To this end,safety constraints are added to the actions of intelligent agents in the model predictive control framework,and a safety reinforcement learning algorithm is designed to obtain the safest action control sequence.At the same time,in response to the problems of high computational complexity and low efficiency in the cross entropy method,as well as the problem of falling into local optima in the gradient optimization method,a combination of robust cross entropy and gradient optimization methods is used to optimize the action control sequence to improve algorithm safety and solving efficiency.The experiment shows that the proposed method can effectively improve the convergence speed compared to the robust cross entropy method,and has the best safety performance compared to other optimization algorithms without sacrificing much performance.
作者 周娴玮 张锟 叶鑫 ZHOU Xianwei;ZHANG Kun;YE Xin(School of Software,South China Normal University,Foshan 538200,China)
出处 《软件导刊》 2024年第9期143-149,共7页 Software Guide
基金 广东省基础与应用基础研究基金项目(2020A1515110783) 广东省企业科技特派员项目(GDKTP2020014000) 佛山市高层次派驻人才项目(303475)。
关键词 强化学习 鲁棒交叉熵 梯度优化 安全性 reinforcement learning robust cross-entropy gradient optimization safety
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部