摘要
In convolutional neural networks(CNNs), the floating-point computation in the traditional convolutional layer is enormous, and the execution speed of the network is limited by intensive computing, which makes it challenging to meet the real-time response requirements of complex applications. This work is based on the principle that the time domain convolution result equals the frequency domain point multiplication result to reduce the amount of floating-point calculations for convolution. The input feature map and the convolution kernel are converted to the frequency domain by the fast Fourier transform(FFT), and the corresponding point multiplication is performed. Then the frequency domain result is converted back to the time domain, and the output result of the convolution is obtained. In the shared CNN, the input feature map is much larger than the convolution kernel, resulting in many invalid operations. The overlap addition method is proposed to reduce invalid calculations and speed up network execution better. This work designs a hardware accelerator for frequency domain convolution and verifies its efficiency on the Xilinx Zynq UltraScale+MPSoC ZCU102 board. Comparing the calculation time of visual geometry group 16(VGG16) under the ImageNet dataset faster than the traditional time domain convolution, the hardware acceleration of frequency domain convolution is 8.5 times.
基金
supported by the Project of the State Grid Corporation of China in 2022 (5700-201941501A-0-0-00)
the National Natural Science Foundation of China (U21B2031)。