期刊文献+

Fast Fourier transform convolutional neural network accelerator based on overlap addition

原文传递
导出
摘要 In convolutional neural networks(CNNs), the floating-point computation in the traditional convolutional layer is enormous, and the execution speed of the network is limited by intensive computing, which makes it challenging to meet the real-time response requirements of complex applications. This work is based on the principle that the time domain convolution result equals the frequency domain point multiplication result to reduce the amount of floating-point calculations for convolution. The input feature map and the convolution kernel are converted to the frequency domain by the fast Fourier transform(FFT), and the corresponding point multiplication is performed. Then the frequency domain result is converted back to the time domain, and the output result of the convolution is obtained. In the shared CNN, the input feature map is much larger than the convolution kernel, resulting in many invalid operations. The overlap addition method is proposed to reduce invalid calculations and speed up network execution better. This work designs a hardware accelerator for frequency domain convolution and verifies its efficiency on the Xilinx Zynq UltraScale+MPSoC ZCU102 board. Comparing the calculation time of visual geometry group 16(VGG16) under the ImageNet dataset faster than the traditional time domain convolution, the hardware acceleration of frequency domain convolution is 8.5 times.
出处 《The Journal of China Universities of Posts and Telecommunications》 EI CSCD 2024年第5期71-84,共14页 中国邮电高校学报(英文版)
基金 supported by the Project of the State Grid Corporation of China in 2022 (5700-201941501A-0-0-00) the National Natural Science Foundation of China (U21B2031)。
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部