期刊文献+

面向神经网络池化层的灵活高效硬件设计

Flexible and Efficient Hardware Design for Neural Network Pooling Layer
下载PDF
导出
摘要 近年来,神经网络加速器逐渐成为研究热点,其中池化层是神经网络加速器的重要组成部分。使用专门的硬件设计方法设计池化层具有过程快和方便修改的优势,但也存在以下问题:不同的池化设计方案由于缺乏向上兼容性而无法适配到最新的神经网络;由于现有的池化方案数据间的复用程度低,导致池化性能偏低。基于此,提出一种面向神经网络池化层的灵活高效的硬件设计。该设计使用Verilog硬件描述语言实现,尽可能考虑到池化算法的各项参数,进而适配最新的神经网络,采取二维拆分与多数据递进处理使其具备高兼容性;结合行缓存提高该设计的性能;乒乓缓存、伪填充及特定池化核延展进一步降低资源使用量。通过实验对多个神经网络中的池化层进行了验证,结果表明,在200 MHz的工作频率下,与CPU(AMD TR Pro 3995WX)相比,运行最大池化最高可实现536倍的加速效果;运行平均池化最高可实现11 248倍的加速效果;运行YOLOv5的池化层时,与通用的数据不复用的方案相比,可以达到以3.5倍的资源获得27倍的加速比;运行GoogleNet的池化层时,与HLS设计方案相比,可实现接近同等的资源获得555倍的加速比。 In recent years,neural network accelerator has gradually become a research hotspot,among which pooling layer is an important part of neural network accelerator.Using specialized hardware design methods to design the pooling layer has the advantages of fast process and easy modification,but it also has the following problems:Different pooling design schemes cannot adapt to the latest neural networks due to lack of upward compatibility.Due to the low reuse degree of data in existing pooling schemes,the pooling performance is low.Based on this,a flexible and efficient hardware design for neural network pooling layer is proposed.The design is implemented by using Verilog hardware description language,and the parameters of the pooling algorithm are considered as much as possible to adapt to the latest neural network.It adopts two dimensional splitting and multi-data progressive processing to make it have high compatibility.Combined with line cache,the performance of the design is improved.Ping-pong caching,spurious padding,and specific pooling kernel extensions further reduce resource usage.The experimental results show that the maximum pooling can achieve up to 536 times faster than CPU(AMD TR Pro 3995WX) at 200 MHz operating frequency.The average pooling can achieve up to 11 248 times of acceleration effect.When running the pooling layer of YOLOv5,it can achieve a speedup of27 times with 3.5 times resources compared to the common scheme without data reuse.When running the pooling layer of GoogleNet,it can achieve nearly 555 times speedup over the HLS design for comparable resources.
作者 何增 朱国权 岳克强 HE Zeng;ZHU Guoquan;YUE Keqiang(School of Electronic Information,Hangzhou Dianzi University,Hangzhou 310018,China;Intelligent Computing Hardware Research Center,Zhijiang Laboratory,Hangzhou 311100,China)
出处 《计算机工程与应用》 CSCD 北大核心 2023年第22期315-321,共7页 Computer Engineering and Applications
基金 浙江省重点研发计划(2022C01048) 之江实验室探索性项目(2022PF0AN01)。
关键词 灵活高效池化 硬件加速 Verilog HDL 数据复用 flexible and efficient pooling hardware acceleration Verilog HDL data reuse
  • 相关文献

参考文献5

二级参考文献20

  • 1冈萨雷斯.数字图像处理(MATLAB版)[M].北京:电子工业出版社.2005.
  • 2Wuytack S,Catthoor F, Franssen F,et al. Global communica- tion and memory optimizing transformations for low power sys- tems[ J]. VKSI Signal Processing, 1994(10) : 178-187.
  • 3van Achtern T, Lauwereins R, Catthoor F. Systematic data re- use exploration methodology for irregular access patterns [ C ]//Proceedings of the 13th International Symposium on System Synthesis. Washington : IEEE Computer Society, 2000 : 115-121.
  • 4Diguet * J P, Wuytack S, Catthoor F, et al. Formalized method- ology for data reuse exploration in hierarchical memory map- pings[ J ]. Iw Power Electronics and Design, 1997 ( 8 ) : 30- 35.
  • 5van Achtern T, Catthoor F. Data Reuse Exploration Techniques for Loop-dominated Applications [ C ]//5th ACM/IEEE De- sign Test Europe Conf.. [ s. 1. ] : [ s. n. ] ,2002:428-435.
  • 6Tuan Jen-Chieh, Chang Tian-'Sheuan ,Jen Chein-Wei. On theData Reuse and Memory Bandwidth Analysis for Full-search Block-matching VLSI Architecture[ J]. IEEE Transaction on Circuits and Systems for Video Technology ,2002 (1) :61-72.
  • 7HennessyJL,PattersonDA.计算机系统结构-量化研究方法[M].北京:电子工业出版社,2007.
  • 8Panda P R, Dutt N D, Nicolau A. Efficient Utilization of Sc- raLch- pad Memory in Embedded Processor Applications [ C ]//EDTC "97 Proceedings of the 1997 European Confer- ence on Design and Test. [ s. 1. ] : [ s. n. ], 1997.
  • 9于方波.基于MATLAB的图像处理[M].第2版.北京:清华大学出版社,2011.
  • 10宋淑娜,李金霞,胡学坤,高尚.一种自适应模糊阈值区间的图像分割方法[J].计算机技术与发展,2010,20(5):121-123. 被引量:6

共引文献27

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部