期刊文献+

基于多层忆阻脉冲神经网络的强化学习及应用 被引量:10

A Novel Reinforcement Learning Algorithm Based on Multilayer Memristive Spiking Neural Network With Applications
下载PDF
导出
摘要 人工神经网络(Artificial neural networks, ANNs)与强化学习算法的结合显著增强了智能体的学习能力和效率.然而,这些算法需要消耗大量的计算资源,且难以硬件实现.而脉冲神经网络(Spiking neural networks, SNNs)使用脉冲信号来传递信息,具有能量效率高、仿生特性强等特点,且有利于进一步实现强化学习的硬件加速,增强嵌入式智能体的自主学习能力.不过,目前脉冲神经网络的学习和训练过程较为复杂,网络设计和实现方面存在较大挑战.本文通过引入人工突触的理想实现元件-忆阻器,提出了一种硬件友好的基于多层忆阻脉冲神经网络的强化学习算法.特别地,设计了用于数据–脉冲转换的脉冲神经元;通过改进脉冲时间依赖可塑性(Spiking-timing dependent plasticity, STDP)规则,使脉冲神经网络与强化学习算法有机结合,并设计了对应的忆阻神经突触;构建了可动态调整的网络结构,以提高网络的学习效率;最后,以Open AI Gym中的CartPole-v0 (倒立摆)和MountainCar-v0 (小车爬坡)为例,通过实验仿真和对比分析,验证了方案的有效性和相对于传统强化学习方法的优势. The combination of reinforcement learning algorithms with artificial neural networks(ANNs) enhances the learning ability of agents effectively. However, these algorithms consume a large number of computing resources, which are unfavourable for hardware implementation. Bionic spiking neural networks(SNNs) convey information by spikes and possess energy-efficient and hardware-friendly features. It is promising to accelerate reinforcement learning and develop embedded self-learning agents based on SNNs. Nevertheless, SNNs lack efficient learning algorithms and their training processes are really complex. As a result, it is challenging to design and implement SNNs. This paper proposes a hardware-friendly reinforcement learning algorithm based on an SNN by introducing famous artificial synapse element:memristor. Data-spike switching spiking neurons are designed especially. Then, we improve spiking-timing-dependent plasticity(STDP) rule to combine the SNN with reinforcement learning organically and the corresponding memristive synapses are created. Besides, the dynamic adjustable network structure is created to increase learning efficiency. Finally,a series of simulations show the effectiveness and advantages of the proposed scheme over conventional reinforcement learning algorithms in applications of CartPole-v0 and MountainCar-v0 in Open AI Gym environment.
作者 张耀中 胡小方 周跃 段书凯 ZHANG Yao-Zhong;HU Xiao-Fang;ZHOU Yue;DUAN Shu-Kai(School of Computer and Information Science, Southwest University, Chongqing 400715;School of Artificial Intelligence, Southwest University, Chongqing 400715;Braininspired Computing and Intelligent Control of Chongqing Key Laboratory, Chongqing 400715;College of Electronic and Information Engineering, Southwest University, Chongqing 400715)
出处 《自动化学报》 EI CSCD 北大核心 2019年第8期1536-1547,共12页 Acta Automatica Sinica
基金 国家自然科学基金(61601376,61672436) 中央高校基本科研业务费(XDJK2019C034) 重庆市基础与前沿技术研究专项(cstc2016jcyjA0547) 中国博士后科学基金(2018T110937) 重庆市博士后科学基金(Xm2017039) 国家级大学生创新创业训练计划项目(201810635017)资助~~
关键词 强化学习 脉冲神经网络 脉冲时间依赖可塑性规则 忆阻器 Reinforcement learning spiking neural network (SNN) spike-timing-dependent plasticity (STDP) memristor
  • 相关文献

参考文献3

二级参考文献17

共引文献281

同被引文献89

引证文献10

二级引证文献55

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部