期刊文献+

基于FPGA的多核可扩展卷积加速器设计 被引量:1

Design of CNN accelerator with multi-core based on FPGA
下载PDF
导出
摘要 为解决卷积神经网络计算效率和能效较低的问题,提出并设计一种使用定点数据作为输入的卷积加速器。加速器支持动态量化的8 bits定点数据的卷积计算,通过采用分块计算的策略和改进的循环计算顺序,有效提高计算效率;支持激活、批标准化(BN)、池化和全连接等计算;基于软硬件协同设计的思路,设计包含卷积加速器和ARM处理器在内的SoC系统。提出一种将加速器进行多核扩展的方法,提高算力和移植便捷性。将加速器部署在Xilinx ZCU102开发板上,其中单核加速器的算力达到了153.6 GOP/s,在计算核数目增加到4个和8个的情况下,算力分别增至614.4 GOP/s和1024 GOP/s。 To solve the problem of low computation and energy efficiency of convolutional neural networks,a CNN hardware accelerator based on FPGA was proposed.The computation of dynamically quantified 8-bits fixed-point data was supported.The computation efficiency was effectively improved by adopting a tiling strategy and optimized circular calculation order.Calculations such as activation,batch normalization(BN),pooling and full connection were supported.Based on the idea of the co-design of hardware and software,a SoC system including accelerator and ARM processor was proposed.A strategy for multi-core expansion of the accelerator was also proposed to further increase the computing performance and improve the convenience of deploying the accelerator on different FPGA platforms.The accelerator was deployed on the Xilinx ZCU102.The computing performance of one-core accelerator can reach 153.6 GOP/s.As the number of accelerator core expands to four and eight,the computing performance is increased to 614.4 GOP/s and 1024 GOP/s,respectively.
作者 张坤宁 赵烁 孙庆斌 邓宁 何虎 ZHANG Kun-ning;ZHAO Shuo;SUN Qing-bin;DENG Ning;HE Hu(Institute of Microelectronics,Tsinghua University,Beijing 100084,China)
出处 《计算机工程与设计》 北大核心 2021年第6期1592-1598,共7页 Computer Engineering and Design
基金 国家自然科学基金项目(91846303)。
关键词 卷积加速 数据复用 并行计算 多核扩展 软硬件协作 convolution acceleration data reuse parallel computation multi-core expansion hardware and software co-design
  • 引文网络
  • 相关文献

同被引文献3

引证文献1

二级引证文献1

;
使用帮助 返回顶部