摘要
针对传统反应式避障算法存在灵活度差及普适性不足的问题,提出Soft Actor-Critic深度强化学习算法的快速避障方法。通过分析SAC算法的框架及更新策略,采用ROS和RVIZ搭建实验仿真环境,优化SAC算法的状态输入,利用灾后中部车场、下部车场虚拟环境对智能体分别进行训练和验证。结果表明,SAC算法优化后,GPU计算速度和算法添加噪声会导致智能体奖励值出现波动,但最终奖励值趋于稳定,改善了优化前越训练奖励值越低的问题,使避障性能大幅度提升,为实现矿山应急救援智能车快速避障控制提供研究基础。
This paper proposes a fast obstacle avoidance method based on Soft Actor-Critic deep reinforcement learning algorithm as an improved alternative to traditional reactionary obstacle avoidance algorithms plagued by the poor flexibility and universality.The study involves analyzing the framework and update strategy of SAC algorithm,building the experimental simulation environment using ROS and RVIZ,optimizing the state input of SAC algorithm,and training and verifying the agents using the virtual environment of the middle yard and the lower yard after the disaster.The results show that the value fluctuation in agent reward due to GPU computing speed and the algorithm noise addition after the SAC algorithm optimization is outweighed by the tendency of final reward value to be stable,an improvement over pre-optimization training in which the more training causes lower reward value and this enables the great improvement in the performance of obstacle avoidance.The research could inform the fast obstacle avoidance control of mine emergency rescue intelligent vehicle.
作者
单麒源
张智豪
张耀心
余宗祥
Shan Qiyuan;Zhang Zhihao;Zhang Yaoxin;Yu Zongxiang(School of Mining Engineering, Heilongjiang University of Science & Technology, Harbin 150022, China)
出处
《黑龙江科技大学学报》
CAS
2021年第1期14-20,共7页
Journal of Heilongjiang University of Science And Technology
基金
黑龙江省省属高校基本科研业务费项目(2018-KYYWF-1173)。
关键词
矿山应急救援
深度强化学习
反应式避障
SAC算法
mine emergency rescue
deep reinforcement learning
reactive obstacle avoidance
Soft Actor-Critic algorithm