复杂可交互场景下基于异策略分层强化学习的搜救机器人自主决策

Autonomous Decision-making of Searching and Rescue RobotsBased on Off-policy Hierarchical Reinforcement Learningin a Complex Interactive Environment

下载PDF

导出

摘要机器人在搜救任务中的自主决策能力对降低救援人员的风险具有重大意义.为了使机器人在面对复杂多解的搜救任务时能自主形成决策和合理的路径规划,设计了一种异策略分层强化学习算法.该算法由两层Soft Actor-Critic(SAC)智能体组成,高层智能体可以自动生成低层智能体所需的目标并提供内在奖励指导其直接与环境进行交互.在分层强化学习的框架下,首先将复杂可交互场景下的机器人搜救任务描述为高层半马尔可夫决策过程与低层马尔可夫决策过程的双层结构,并针对不同层级设计不同的状态空间、动作空间与奖励函数等.其次,针对传统强化学习算法中目标与奖励函数需要人工设计且缺乏通用性的问题,应用基于SAC的异策略分层强化学习算法训练双足移动机器人与复杂场景交互,通过数据的高效利用和目标空间的调整实现救援机器人的自主决策.仿真结果验证了所设计的算法在解决复杂多路径搜救任务中的有效性和通用性. The autonomous decision-making of robots in searching and rescue tasks is of great significance for reducing the risk to human rescuers.To make the robot generate decision-making autonomously and path planning reasonably in the face of complex searching and rescue tasks with multi-solution,an off-policy hierarchical reinforcement learning algorithm was designed in this paper.The algorithm consists of two layers of Soft Actor-Critic(SAC)agents,where the higher-level agent can automatically generate goals needed by the lower-level agent and can provide intrinsic reward to guide the lower-level agent to interact with the environment directly.Under the framework of hierarchical reinforcement learning,the robot searching and rescue task in a complex interactive environment was first described as a two-layer structure with a high-level semi-Markov decision process and a low-level Markov decision process.Then different state spaces,action spaces and reward functions at different levels were designed.Next,in view of the problem that the goals and reward functions in traditional reinforcement learning algorithms were needed to design manually,a SAC-based off-policy hierarchical reinforcement learning algorithm wasapplied to train bipedal mobile robots to interact with the complex environment.The autonomous decision-making of the searching and rescue robots was achieved through efficient use of data and adjustment ofgoal space.The simulation results verify the effectiveness and generality of the proposed algorithm insolving complex multi-path searching and rescue tasks.

作者殷辰堃纪宏萱张严心 YIN Chenkun;JI Hongxuan;ZHANG Yanxin(School of Electronic and Information Engineering,Bejing Jiaotong University,Beijing 10004,China)

机构地区北京交通大学电子信息工程学院

出处《北京工业大学学报》 CAS CSCD 北大核心 2023年第4期403-414,共12页 Journal of Beijing University of Technology

基金国家自然科学基金面上资助项目(62273028,62073025,62073026)。

关键词分层强化学习 Soft Actor-Critic算法搜索救援任务双足移动机器人自主决策交互场景 hierarchical reinforcement learning Soft Actor-Critic algorithm searching and rescue tasks bipedal mobile robots autonomous decision-making interactive environment

分类号 U461 [机械工程—车辆工程] TP308 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

1刘陈.计及SOP和储能系统的配电网联合优化研究[J].电力电子技术,2022,56(11):84-87. 被引量：1
2周婉,姚溪子,肖雨薇,刘艳芳.基于分层强化学习的雅达利游戏决策算法[J].信息与电脑,2022,34(20):97-99.
3陆超.地震搜救机器人设计[J].包装工程,2023,44(4). 被引量：2
4李永迪,李彩虹,张耀玉,张国胜.基于改进SAC算法的移动机器人路径规划[J].计算机应用,2023,43(2):654-660. 被引量：7
5赵英,秦进,袁琳琳.结合新颖性和风险评估的内在奖励方法[J].计算机工程与应用,2023,59(5):148-154. 被引量：1
6李贺,吴祐昕.基于场景交互理论的微信朋友圈海报视觉规律研究[J].艺术科技,2023,36(3):188-191.
7胡成云.例谈初中数学课外作业的层级设计[J].中小学数学（初中版）,2023(1):6-8. 被引量：1
8何健辉,王梓斌,魏兴云,熊刚.无人机测控信息系统安全性分析[J].信息系统工程,2023(2):81-83.
9余慧娟.数字科技创新激活非遗传承保护的思考与研究--以浙江省非遗数字化建设为例[J].非遗传承研究,2023(1):28-32. 被引量：1
10胡雪梅.“沉淀溶解平衡”单元教学设计初探[J].文理导航,2023(8):40-42.

北京工业大学学报

2023年第4期

浏览历史

内容加载中请稍等...

复杂可交互场景下基于异策略分层强化学习的搜救机器人自主决策

相关作者

相关机构

相关主题

浏览历史