摘要
在很多的嵌入式系统应用中,都有实现人工智能任务的需求.近年来,深度学习在人工智能领域取得了巨大的成功,它的成功给嵌入式系统中的人工智能应用带来了新的发展机遇.本文使用分簇架构32位DSP处理器BWDSP,对卷积神经网络中,计算时间超过90%的总计算时间的卷积计算的并行算法进行了研究.性能测试表明,本文设计的算法的性能能达到2.27GM ACS,是常规的GEM M算法的9.5倍和向量化算法的5.7倍.对比一些基于FPGA设计的卷积计算算法,算法的计算器件的平均性能是它们的1.63倍到10.85倍.
There many embedded system applications that have requirement of implementing artificial intelligence tasks. Recently, deep learning has achieved great success in artificial intelligent. Its prosperity bring new opportunity for the embedded system's artificial in- telligence tasks. In the paper, based on a multi-cluster DSP, BWDSP, the core operator in CNN, the convolution operator was studied and we designed a multi-cluster parallel convolution computing algorithm. According to benchmark,its performance is 2.27 GMAC ,9.5 times as fast as GEMM algorithm and 5.7 times as fast as vectored algorithm. Comparing with equivalent computing recourses' averaging performance with convolution computing algorithm based on based on FPGA platform,it is 1.63 to 10.85 times better than them.
出处
《小型微型计算机系统》
CSCD
北大核心
2018年第3期520-524,共5页
Journal of Chinese Computer Systems
基金
国家核高基重大专项项目(2012ZX01034-001-001)资助