摘要
针对传统深度Q学习算法在AGV路径规划问题容易过估计和收敛效果不好的问题,提出一种改进的双深度Q学习算法,通过引入优先经验回放机制和基于启发式信息的连续奖励函数,提高AGV智能体的有效训练,利用贪婪策略和玻尔兹曼策略结合的动作选择策略在与环境交互的训练中引导AGV智能体充分探索环境。仿真实验结果表明,所提出的算法能够规划出良好的AGV路径,而且在算法的稳定性和收敛速度上有所提升。
To solve the problem that the traditional deep Q learning algorithm is prone to overestimation and poor convergence in AGV path planning,we proposed an improved double deep Q learning algorithm.By introducing the preferential experience playback mechanism and continuous reward function based on heuristic information,we improved the training efficiency of AGV agents.Next,using the action selection strategy combining greed strategy and Boltzmann strategy,we guided the AGV agents to fully explore the environment during their interaction with it.The simulation result showed that the proposed algorithm can provide good AGV path planning results,and achieve better stability and convergence speed.
作者
房殿军
周彬彬
赵春苗
ROLF Schmidt
FANG Dianjun;ZHOU Binbin;ZHAO Chunmiao;SCHMIDT Rolf(School of Mechanical Engineering,Tongji University,Shanghai 200092;Qingdao Sino-German Institute of Intelligent Technologies,Qingdao 266000;Suzhou i-COW Intelligent Logistics Technology Co.,Ltd.,Suzhou 215000,China)
出处
《物流技术》
2023年第6期45-51,共7页
Logistics Technology
基金
国家重点研发计划“政府间国际科技创新合作”重点专项项目(2022YFE0114300)。
关键词
AGV路径规划
强化学习
离散制造系统
DQN算法
AGV path planning
reinforcement learning
discrete manufacturing system
DQN algorithm