期刊文献+

一种面向神威·太湖之光的通用并行卷积算法 被引量:1

A General Parallel Convolution Algorithm for Sunway Taihu Light
下载PDF
导出
摘要 神威·太湖之光深度学习库中的并行卷积算法存在批量受限的问题,且传统gemm卷积算法在其硬件架构上效率较低。基于申威异构众核处理器,提出一种无批量限制的通用并行卷积算法。结合异步DMA访存操作和从核间的寄存器通信,使用数据重用和软件流水等方法降低从核访存开销,利用手动向量化的方法充分发挥从核浮点的计算能力。实验结果表明,与基础7层循环算法、gemm算法和Intel平台上的MKL-DNN算法相比,该算法的加速性能较好。 The parallel convolution algorithm in the deep learning library of Sunway Taihu Light has the problem of batch limitation,and the traditional gemm convolution algorithm is inefficient for its hardware architecture.In order to solve the above problems,a general parallel convolution algorithm without batch limitation based on Sunway heterogeneous multi-core processor is proposed.Combined with asynchronous DMA fetch operations and inter-core register communication,the algorithm communication overhead is reduced by means of data reuse and software pipelining,and the floating point caculation performance of the slave core is fully utilized by using manual vectorization.Experimental results show that compared with the basic 7-layer loop algorithm,gemm algorithm and MKL-DNN algorithm on Intel platform,the acceleration performace of the proposed algorithm is better.
作者 舒嘉明 安虹 武铮 陈俊仕 SHU Jiaming;AN Hong;WU Zheng;CHEN Junshi(School of Computer Science and Technology,University of Science and Technology of China,Hefei 230000,China)
出处 《计算机工程》 CAS CSCD 北大核心 2019年第12期153-159,共7页 Computer Engineering
基金 国家重点研发计划(2016YFB1000403)
关键词 神威·太湖之光 卷积神经网络 数据重用 软件流水 批量受限 Sunway Taihu Light Convolutional Neural Network(CNN) data reuse software pipelining batchlimitation
  • 相关文献

参考文献2

二级参考文献9

共引文献1656

同被引文献4

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部