期刊文献+

面向嵌入式的卷积神经网络硬件加速器设计 被引量:4

Design of Hardware Accelerator for Embedded Convolutional Neural Network
下载PDF
导出
摘要 近年来,随着神经网络模型越来越复杂,针对卷积神经网络推理计算所需内存空间过大,限制其在嵌入式设备上部署的问题,提出一种动态多精度定点数据量化硬件结构,使用定点数代替训练后推理过程中的浮点数执行卷积运算。结果表明,采用16位动态定点量化和并行卷积运算硬件架构,与静态量化策略相比,数据准确率高达97.96%,硬件单元的面积仅为13740门,且内存占用量和带宽需求减半。相比Cortex M4使用浮点数据做卷积运算,该硬件加速单元性能提升了90%以上。 In recent years,neural network models become more and more complex.Aiming at the large memory space required for convolutional neural network inference calculations,which limits its deployment on embedded devices,a dynamic multi-precision fixed-point data quantization hardware structure is proposed.It uses fixed-point data instead of floating-point data during neural network inference to perform convolutional operations.The results show that compared with the static quantization strategy,using a 16 bit fixed-point dynamic quantization and parallel convolutional operation hardware architecture,data accuracy is up to 97.96%.The hardware unit area is only 13740 gates,and the memory footprint and bandwidth requirement are reduced 50%.In addition,compared with Cortex M4,which performs convolutional operations using floating-point data,the embedded system SoC performance is improved more than 90%.
作者 唐蕊 焦继业 徐华昊 TANG Rui;JIAO Jiye;XU Huahao(School of Computer Science&Technology,Xi’an University of Posts&Telecommunications,Xi’an 710121,China)
出处 《计算机工程与应用》 CSCD 北大核心 2021年第4期252-257,共6页 Computer Engineering and Applications
基金 国家自然科学基金(61874087)。
关键词 卷积神经网络 嵌入式设备 动态多精度定点数据量化 并行卷积运算硬件架构 convolutional neural network embedded devices dynamic multi-precision fixed-point data quantization parallel convolutional operation hardware architecture
  • 相关文献

参考文献12

二级参考文献29

共引文献128

同被引文献46

引证文献4

二级引证文献16

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部