期刊文献+

基于FPGA的递归神经网络加速器的研究进展 被引量:3

Survey of FPGA based recurrent neural network accelerator
下载PDF
导出
摘要 递归神经网络(RNN)近些年来被越来越多地应用在机器学习领域,尤其是在处理序列学习任务中,相比CNN等神经网络性能更为优异。但是RNN及其变体,如LSTM、GRU等全连接网络的计算及存储复杂性较高,导致其推理计算慢,很难被应用在产品中。一方面,传统的计算平台CPU不适合处理RNN的大规模矩阵运算;另一方面,硬件加速平台GPU的共享内存和全局内存使基于GPU的RNN加速器的功耗比较高。FPGA由于其并行计算及低功耗的特性,近些年来被越来越多地用来做RNN加速器的硬件平台。对近些年基于FPGA的RNN加速器进行了研究,将其中用到的数据优化算法及硬件架构设计技术进行了总结介绍,并进一步提出了未来研究的方向。 Recurrent neural network(RNN) has been used wildly used in machine learning field in recent years, especially in dealing with sequential learning tasks compared with other neural network like CNN. However, RNN and its variants, such as LSTM, GRU and other fully connected networks, have high computational and storage complexity, which makes its inference calculation slow and difficult to be applied in products. On the one hand, traditional computing platforms such as CPU are not suitable for large-scale matrix operation of RNN. On the other hand, the shared memory and global memory of hardware acceleration platform GPU make the power consumption of GPU-based RNN accelerator higher. More and more research has been done on the RNN accelerator of the FPGA in recent years because of its parallel computing and low power consumption performance. An overview of the researches on RNN accelerator based on FPGA in recent years is given. The optimization algorithm of software level and the architecture design of hardware level used in these accelerator are summarized and some future research directions are proposed.
作者 高琛 张帆 GAO Chen;ZHANG Fan(National Digital Switching System Engineering and Technological Research Center, Zhengzhou 450002, China)
出处 《网络与信息安全学报》 2019年第4期1-13,共13页 Chinese Journal of Network and Information Security
基金 国家自然科学基金资助项目(No.61572520) 国家自然科学基金创新研究群体资助项目(No.61521003)~~
关键词 递归神经网络 FGPA 加速器 recurrent neural network FPGA accelerator
  • 相关文献

参考文献1

二级参考文献11

  • 1UNDERWOOD K. FPGAs vs. CPUs: trends in peak floating-point performance [C] // Proceedings of the International Symposium on Field Programmable Gate Arrays. Monterey: ACM , 2004: 171- 180.
  • 2UNDERWOOD K, HEMMERT K. Closing the gap: CPU and FPGA trends in sustainable floating-point BLAS performance [C]//Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM '04). Washington: IEEE, 2004: 219 - 228.
  • 3AMIRA A, BENSAALI F. An FPGA based parametrisable system for matrix product implementation [C] // Proceedings of the IEEE Workshop on Signal Processing Systems Design and Implementation (SIPS2002). San Diego: IEEE, 2002: 75-79.
  • 4JANG J, CHOI S, PRASANNA V K. Area and time efficient implementation of matrix multiplication on FPGAs [C]//Proeeedings of IEEE International Conference on Field Programmable Technology. [S. I. ]: IEEE, 2002:93 - 100.
  • 5ZHUO L, PRASANNA V K. Scalable and modular algorithms for floating-point matrix multiplication on FPGAs [C]// Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS ' 04). [S. l. ]: IEEE, 2004: 92.
  • 6DOU Y, VASSILIADIS S, KUZMANOV G K, et al. 64-bit floating-point FPGA matrix multiplication [C]// Proceedings of the International Symposium on Field Programmable Gate Arrays. Monterey: ACM, 2005: 86 - 95.
  • 7CAMPBELL S J, KHATRI S P. Resource and delay efficient matrix multiplication using newer FPGA devices [C] // Proceedings of the 16th ACM Great Lakes Symposium on VLSI. Philadelphia: ACM, 2006:308 - 311.
  • 8ZHUO L, PRASANNA V K. Sparse matrix-vector multiplication on FPGAs [C]//Proceedings of the International Symposium on Field Programmable Gate Arrays. Monterey: ACM, 2005:63 - 74.
  • 9DE LORIMIER M, DE HON A. Floating-point sparse matrix-vector multiply for FPGAs [C] // Proceedings of the International Symposium on Field Programmable Gate Arrays. Monterey: ACM, 2005:75-85.
  • 10IEEE. IEEE Std 754-1985, IEEE standard for binary floating-point arithmetic [S]. New York: IEEE, 1985.

共引文献20

同被引文献16

引证文献3

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部