期刊文献+

基于强化学习的机器人底盘能量管理与路径规划优化算法

Reinforcement learning-based optimization algorithm for energy management and path planning of robot chassis
下载PDF
导出
摘要 为解决温室机器人底盘传统路径规划中因忽略地面粗糙度而导致的电池寿命缩短与利用效率低下的问题,该研究探讨了3种融合电池能量管理与路径规划的强化学习算法。首先,基于先验知识构建分级预打分奖励模型,并通过增加曼哈顿距离构建奖励函数,提高电池寿命和利用率;其次,针对传统Q-Learning(QL)算法收敛效率低、易陷入局部最优等问题,提出了自适应变步长的优化算法(adaptive multi-step q-learning,AMQL)和基于自适应改变探索率的优化算法(adaptive ε-greedy q-learning,AEQL),以提升Q-Learning算法的性能。此外,为进一步提高算法的可行性,该文将AMQL算法和AEQL算法进行融合,提出了一种自适应多步长和变ε-greedy融合算法(adaptive multi-step and ε-greedy q-learning,AMEQL),并通过仿真对比的方式,验证了AMQL和AMEQL算法相对于传统QL算法在3个不同垄道下的性能。仿真试验结果表明:AMQL相对于传统QL算法,训练平均时间降低23.74%,收敛平均迭代次数降低8.82%,路径平均拐点数降低54.29%,收敛后的平均波动次数降低14.54%;AMEQL相对于传统QL算法,训练平均时间降低34.46%,收敛平均迭代次数降低18.02%,路径平均拐点数降低63.13%,收敛后的平均波动次数减少15.62%,在400次迭代过程中,AMEQL到达最大奖励后平均每7.12次迭代波动1次,而AMQL平均每6.68次迭代波动1次。可知AMEQL训练时间最短,收敛最快,路径拐点数量最低,奖励波动最小,而AMQL次之。该算法可为机器人底盘自主路径规划提供理论参考。 Ground roughness can significantly impact the battery performance in greenhouse environments.In this study,battery energy management was integrated with path planning to address this challenge.A systematic investigation was also implemented to explore the effects of ground roughness on the battery life and utilization efficiency of greenhouse vehicle platforms.A graded pre-scoring model was constructed using prior knowledge.Additionally,the Manhattan distance between the vehicle's current position and the target point was incorporated into the reinforcement learning reward function,thus linking travel distance with battery life to optimize both battery utilization efficiency and life during path planning.An Adaptive Multi-step Q-learning algorithm(AMQL)with adaptive step sizes and an Adaptive 8-greedy Q-learning algorithm(AEQL)with an adaptive exploration rate was proposed to enhance the performance of the Q-learning algorithm.The traditional Q-learning algorithms were associated with some issues,such as long iteration times,low convergence efficiency,susceptibility to local optima,and excessive path turns.The AMQL algorithm was used to adjust the step size,according to the forward reward assessment,if the reward at the current position increased corresponding to the previous reward,the step size increased.The step size gradually decreased to prevent suboptimal path optimization,as the current position approached the endpoint.The AEQL algorithm was used to adaptively adjust the exploration rate using the difference between adjacent reward values-e increased when the adjacent reward value increased,and decreased when the reward value decreased.Although AMQL improved the convergence efficiency and iteration speed,the variations in the step size caused significant fluctuations in rewards,resulting in lower algorithm stability.Additionally,there was no outstanding impact of multi-step length on the convergence efficiency and iteration speed.Furthermore,the AEQL enhanced the exploration efficiency and algorithm stability through dynamic adjustments.But its fluctuating rise during the initial training phase also increased the training time.Therefore,the AMQL and AEQL algorithms were combined to develop an Adaptive Multi-step and c-greedy Q-learning algorithm(AMEQL),in order to ensure faster and more optimal global path selection during path planning.In a simulated environment,the models were first used to simulate a realistic greenhouse tomato scenario.Then,an Inertial Measurement Unit(IMU)was used to record the changes in the aisle roughness in real time.This data was then incorporated into the simulation model.Finally,300 rounds of simulation experiments were carried out to test the traditional Q-learning,AMQL,and AMEQL algorithm for path planning in the single-row(30 mx20 m),double-row(50 mx50 m),and triple-row(70 mx50 m)environments.Simulation results show that the AMEQL algorithm reduced the average training time by 44.10%,the average number of iterations required for convergence by 11.06%,the number of path turns by 63.13%,and the post-convergence average fluctuation by 15.62%,compared with the traditional Q-learning.Due to its higher convergence speed in 400 iterations,the AMEQL algorithm averaged 14 fluctuations per 100 iterations after reaching the maximum reward,while the AMQL algorithm averaged 15 fluctuations.This algorithm can provide a theoretical reference for the autonomous path planning of greenhouse platforms.
作者 李潇宇 张君华 郭晓光 伍纲 LI Xiaoyu;ZHANG Junhua;GUO Xiaoguang;WU Gang(School of Mechanical and Electrical Engineering,Beijing Information Science and Technology University,Beijing 100192,China;Institute of Agricultural Environment and Sustainable Development,Chinese Academy of Agricultural Sciences,Beijing 100081,China)
出处 《农业工程学报》 EI CAS CSCD 北大核心 2024年第21期175-183,共9页 Transactions of the Chinese Society of Agricultural Engineering
基金 国家自然科学基金项目(12272057)。
关键词 温室 路径规划 强化学习 能量管理 多目标优化 greenhouse path planning reinforcement learning energy management multi-objective optimization
  • 相关文献

参考文献10

二级参考文献158

共引文献55

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部