摘要
使用强化学习解决机器人操作问题有着诸多优势,然而传统的强化学习算法面临着奖励稀疏的困难,且得到的策略难以直接应用到现实环境中。为了提高策略从仿真到现实迁移的成功率,提出了基于目标的域随机化方法。使用基于目标的强化学习算法对模型进行训练,可以有效地应对机器人操作任务奖励稀疏的情况,得到的策略可以在仿真环境下良好运行。与此同时在算法中还使用了目标驱动的域随机化方法,在提高策略泛用性以及克服仿真和现实环境之间的差距上有着良好的效果,仿真环境下的策略容易迁移到现实环境中并成功执行。结果表明,使用了基于目标的域随机化方法的强化学习算法有助于提高策略从仿真到现实迁移的成功率。
Reinforcement learning method has many advantages in solving the robot manipulation problems.However,the traditional reinforcement learning algorithms face the difficulty of sparse reward,and the policy is difficult to be directly applied to the reality.In order to improve the success rate of policy migration from simulation to reality,this paper proposed a goal-based domain randomization method.The method used the goal-based reinforcement learning algorithm to train the model,which could effectively deal with the sparse reward of robot manipulation tasks,and the policy could run well in the simulation environment.At the same time,the method used the goal-conditioned domain randomization algorithm,which had a good performance on improving the universality of policy and overcoming the reality gap between simulation and reality.The policy in simulation is easy to migrate to reality and execute successfully.The results show that the reinforcement learning algorithm using the goal-based domain randomization method helps to improve the success rate of policy migration from simulation to reality.
作者
张夏禹
陈小平
Zhang Xiayu;Chen Xiaoping(University of Science&Technology of China,Hefei 230026,China)
出处
《计算机应用研究》
CSCD
北大核心
2022年第10期3084-3088,共5页
Application Research of Computers
基金
国家重点研发计划资助项目(2019YFE0125200)。
关键词
强化学习
域随机化
机器人操作
仿真到现实迁移
reinforcement learning
domain randomization
robot manipulation
sim-to-real