期刊文献+

基于通用向量DSP的深度学习硬件加速技术 被引量:3

Deep learning hardware acceleration based on general vector DSP
原文传递
导出
摘要 随着深度学习在众多领域发挥着越来越重要的作用,如何设计高性能、低功耗、低延迟的深度学习硬件加速器成为体系结构领域的研究热点.本文基于深度学习算法模型的结构和优化方法,分析了深度学习硬件实现中面临的困难和挑战,并对比当前主流的深度学习硬件加速平台的优势和不足,提出了基于飞腾–迈创通用向量DSP的深度学习硬件加速方案,对其向量广播、矩阵转换等加速技术进行了阐述.并围绕目前通用向量DSP硬件加速的不足,对兼顾通用向量计算和专用深度学习计算的可重构计算阵列等优化技术进行了深入的探讨与研究. As deep learning(DL)plays an increasingly significant role in several fields,designing a high performance,low power,low-latency hardware accelerator for DL has become a topic of interest in the field of architecture.Based on the structure and optimization method of DL algorithms,this study aims to analyze the difficulties and challenges in DL hardware design.In comparison with the current mainstream DL hardware acceleration platform,advantages of the DL hardware acceleration based on general vector DSP are discussed.Besides,acceleration techniques,such as vector broadcasting and matrix conversion,are described.From the viewpoint of the shortcomings of the general vector DSP discussed herein,optimization techniques such as reconfigurable computing arrays that take into account the general vector calculations as well as specific DL acceleration are discussed in depth.
作者 王慧丽 郭阳 屈婉霞 Huili WANG;Yang GUO;Wanxia QU(School of Computer,National University of Defense Technology,Changsha 410073,China)
出处 《中国科学:信息科学》 CSCD 北大核心 2019年第3期256-276,共21页 Scientia Sinica(Informationis)
基金 国家自然科学基金(批准号:61832018 61572025)资助项目
关键词 深度学习 体系结构 硬件设计 加速器 数字信号处理器(DSP) deep learning architecture hardware design accelerator digital signal processor
  • 相关文献

参考文献1

共引文献18

同被引文献15

引证文献3

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部