摘要
研究基于深度强化学习技术的避障场景的算法模型设计,采用改进的深度Q网络(Deep Q-learning Network,DQN)算法克服了Q-learning表格式算法在连续状态下导致内存不足的局限性。鉴于学习过程中奖励稀疏导致很难获得较好结果的情况,改进奖励机制,增加实时奖惩作为补充,解决学习耗时长和训练不稳定的问题;采用相对角度、位置和距离等信息,相比绝对坐标信息可以更有效的躲避障碍物。不同于基于栅格法/可视图法等传统人为策略避障算法,深度强化学习算法DQN能够在缺乏先验知识的条件下具备自主决策能力,因此适用性更强。该技术可应用在仓储无人车、巡检机器人、无人机等现实场景。
It researched the design of algorithmic models for obstacle avoidance scenarios using deep reirfforcement learning techniques,and adopted an improved Deep Q-learning Network (DQN) algorithm to overcome the problem of the Q-learning table format algorithm which leads to insufficient memory in continuous state.In view of rewarding sparseness in the learning process makes it difficult to obtain better results, to improve the reward mechanism, increased real-time rewards and punishments as a supplement to solve the problem of long learning time and unstable training;use information such as relative angle, position,and distance to avoid obstacles more eft)etively than absolute coordinate irfformation. Dift)rent from the traditional human strategy obstacle avoidance algorithm, such as grid method/visibility include, deep reirfforeement learn-ing algorithm DQN has the capability of autonomous decision-making under the condition of lack of prior knowledge, so it has stronger adaptability. The technology can be applied in the storage of unmanned vehicles,inspection robots,drones and othor ronlistio soonnrios
作者
刘庆杰
林友勇
李少利
LIU Qing-jie;LIN You-yong;LI Shao-li(CETHIK Research Institute,Hangzhou 310012,China)
出处
《智能物联技术》
2018年第2期18-22,共5页
Technology of Io T& AI
关键词
深度强化学习
DQN
自主决策
避障
deep reirfforcement learning
DQN
auto-decision
obstacle avoidance