摘要
针对移动机器人避障上存在的自适应能力较差的问题,结合遗传算法(GA)的进化思想,以自适应启发评价(AHC)学习和操作条件反射(OC)理论为基础,提出了一种基于进化操作行为学习模型(EOBLM)的移动机器人学习避障行为的方法。该方法是一种改进的AHC学习模式,评价单元采用多层前向神经网络来实现,利用TD算法和梯度下降法进行权值更新,这一阶段学习用来生成取向性信息,作为内在动机决定进化的方向;动作选择单元主要用来优化操作行为以实现状态到动作的最佳映射。优化过程分两个阶段来完成,第一阶段通过操作条件反射学习算法得到的信息熵作为个体适应度,执行GA学习算法搜索最优个体;第二阶段由OC学习算法选择最优个体内的最优操作行为,并得到新的信息熵值。通过移动机器人避障仿真实验,结果表明所设计的EOBLM能使机器人通过不断与外界未知环境进行交互主动学会避障的能力,与传统的AHC方法相比其自学习自适应的能力得到加强。
To solve the problem of poor self-adaptive ability in the robot obstacle avoidance,combined with evolution thought of Genetic Algorithm(GA),an Evolutionary Operant Behavior Learning Model(EOBLM) was proposed for the mobile robot learning obstacle avoidance in unknown environment,which was based on Operant Conditioning(OC) and Adaptive Heuristic Critic(AHC) learning.The proposed model was a modified version of the AHC learning architecture.Adaptive Critic Element(ACE) network was composed of a multi-layer feedforward network and the learning was enhanced by TD(λ) algorithm and gradient descent algorithm.A tropism mechanism was designed in this stage as intrinsic motivation and it could direct the orientation of the Agent learning.Adaptive Selection Element(ASE) network was used to optimize operant behavior to achieve the best mapping from state to actor.The optimizing process has two stages.At the first stage,the information entropy got by OC learning algorithm was used as individual fitness to search the optimal individual with executing the GA learning.At the second stage,the OC learning selected the optimal operation behavior within the optimal individual and got new information entropy.The results of experiments on obstacle avoidance show that the method endows the mobile robot with the capabilities of learning obstacle avoidance actively for path planning through interaction with the environment constantly.The results were compared with the traditional AHC learning algorithm,and the proposed model had better performance on self-learning and self-adaptive abilities.
出处
《计算机应用》
CSCD
北大核心
2013年第8期2283-2288,共6页
journal of Computer Applications
基金
浙江省青年科学基金资助项目(LQ13F030012)
浙江农林大学人才启动项目(2013FR023)
关键词
移动机器人
自适应启发评价
操作条件反射
遗传算法
避障
mobile robot
Adaptive Heuristic Critic(AHC)
operant conditioning
Genetic Algorithm(GA)
obstacle avoidance