基于鲁棒交叉熵与梯度优化的安全强化学习方法

Safe Reinforcement Learning Method Based on Robust Cross-Entropy and Gradient Optimization

下载PDF

导出

摘要智能体在复杂环境下执行任务时,如何保证安全性和效率性是一个很大的难题。传统强化学习方法解决智能体决策问题时采用无模型的强化学习,利用大量数据不断试错寻找最优策略,忽略了智能体的训练成本和安全风险,因此无法有效保证决策的安全性。为此,在模型预测控制框架下对智能体动作添加安全约束条件,设计安全强化学习算法获得最安全的动作控制序列。同时,针对交叉熵方法存在计算量大与效率低、梯度优化方法存在着陷入局部最优的问题,结合鲁棒交叉熵与梯度优化方法优化动作控制序列,以提升算法安全性和求解效率。实验表明,所提方法相较于鲁棒交叉熵法能有效提升收敛速度,相较于其他优化算法在不损失较多性能的前提下安全性能最优。 Ensuring security and efficiency when intelligent agents perform tasks in complex environments is a major challenge.Traditional reinforcement learning methods use model free reinforcement learning to solve intelligent decision-making problems,constantly trial and error to find the optimal strategy using a large amount of data,ignoring the training cost and security risks of the agent,and therefore cannot effectively ensure the safety of decision-making.To this end,safety constraints are added to the actions of intelligent agents in the model predictive control framework,and a safety reinforcement learning algorithm is designed to obtain the safest action control sequence.At the same time,in response to the problems of high computational complexity and low efficiency in the cross entropy method,as well as the problem of falling into local optima in the gradient optimization method,a combination of robust cross entropy and gradient optimization methods is used to optimize the action control sequence to improve algorithm safety and solving efficiency.The experiment shows that the proposed method can effectively improve the convergence speed compared to the robust cross entropy method,and has the best safety performance compared to other optimization algorithms without sacrificing much performance.

作者周娴玮张锟叶鑫 ZHOU Xianwei;ZHANG Kun;YE Xin(School of Software,South China Normal University,Foshan 538200,China)

机构地区华南师范大学软件学院

出处《软件导刊》 2024年第9期143-149,共7页 Software Guide

基金广东省基础与应用基础研究基金项目(2020A1515110783) 广东省企业科技特派员项目(GDKTP2020014000) 佛山市高层次派驻人才项目(303475)。

关键词强化学习鲁棒交叉熵梯度优化安全性 reinforcement learning robust cross-entropy gradient optimization safety

分类号 TP391.4 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

1杨艳华,潘鑫,张丽丽,姚立纲.基于交叉熵算法求解多目标柔性作业车间调度问题[J].武汉大学学报（工学版）,2024,57(4):497-508.
2步雅楠,任涛.基于深度学习的继电保护算法与优化分析[J].电子技术（上海）,2024,53(6):380-381.
3苏珍琴.水库大坝碾压混凝土加固施工技术分析[J].中文科技期刊数据库（全文版）工程技术,2024(10):0162-0165.
4马宝林,王静安,郭明瑞,韩晨晨,高卫东.基于改进YOLOv5s的并条棉网杂质检测[J].棉纺织技术,2024,52(10):42-46.
5宁美利.数字技术在成人教育服务体系中的应用研究[J].社会科学前沿,2024,13(8):500-508.
6张华英.基于多智能体DRL的区块链物联网协同计算卸载[J].计算机应用与软件,2024,41(9):339-347.
7李小波.民用机场施工作业现场安全文化建设与管理[J].中国公共安全,2023(7):19-21.
8刘磊,葛振业,林杰,陶宇,孙俊杰.基于鱼群涌现行为启发的集群机器人硬注意力强化模型[J].计算机应用研究,2024,41(9):2737-2744.
9王伟家,张宇,王京华,徐勇.基于改进RetinaNet的轻量化钢材表面缺陷检测算法[J].模式识别与人工智能,2024,37(8):692-702.
10张博超.旋转伺服电机功率提升[J].防爆电机,2024,59(5):32-38.

软件导刊

2024年第9期

浏览历史

内容加载中请稍等...

基于鲁棒交叉熵与梯度优化的安全强化学习方法

相关作者

相关机构

相关主题

浏览历史