摘要
为实现电网带电作业机器人手臂的精准导航,提出全局加权奖励机制,建立基于全局加权奖励机制和双深度Q网络算法的机器人手臂精准导航模型,解决了Q值过估计和更新效率低的问题。研究仿真机器人手臂跨线作业避障和导航,结果表明:学习率最佳值为0.005,全局加权奖励机制相比当前状态即时奖励,更能够提高Q值更新效率;基于全局加权奖励机制和双深度Q网络算法建立跨线作业模型,得到收敛后的偏差降为±6.45。基于全局加权奖励机制和双深度Q网络算法建立机器人手臂精准导航模型,其收敛速度和准确性都有所提升,实现了机器人带电作业的精准导航。
In order to achieve the precise navigation of the live working manipulator(robot arms)in the power grid,the global weighted reward mechanism is proposed,and an advanced accurate navigation model of the manipulator based on the mechanism of global weighted reward and the algorithm of double-depth Q network is built to solve the issue of Q-value overestimation and low update efficiency.The obstacle avoidance and navigation of the robotic arms during the cross-line operation are studied,and the result shows that the best learning rate is 0.005 and the global weighted reward mechanism,compared to the immediate reward of the current state,can more effectively improve the efficiency of Q-value updates;and the convergence deviation of the cross-line operation model based on the global weighted reward mechanism and the double-depth Q network algorithm reduces to±6.45.The advanced precise navigation model of the DDQN robot arm established based on the global weighted reward mechanism has stronger generalization performance and realizes the accurate navigation of the robot live operation.
作者
李宁
何义良
赵建辉
刘兆威
田志
LI Ning;HE Yiliang;ZHAO Jianhui;LIU Zhaowei;TIAN Zhi(Hengshui Power Supply Branch,State Grid Hebei Electric Power Co.,Ltd.,Hengshui 053000,Hebei,China)
出处
《电网与清洁能源》
CSCD
北大核心
2024年第11期9-15,共7页
Power System and Clean Energy
基金
国家电网有限公司科技项目(kj2021-044)。
关键词
带电作业
机械臂
深度强化学习
双深度Q网络
精准导航
live working
manipulator
deep reinforcement learning
double deep Q-learning network
autonomous navigation