期刊文献+

基于重计算的深度学习加速器容错设计

Fault-tolerant design of deep learning accelerator based on recomputing
下载PDF
导出
摘要 2D计算阵列由于高并行性且通信简单,在深度学习加速器(deep learning accelerator,DLA)中经常负责处理卷积的大量计算,若出现硬件故障,则会导致计算错误,从而造成预测精度大幅下降。为了修复2D计算阵列中的故障,文章提出一种用于容错DLA的重计算结构(recomputing architecture,RCA),与传统的在阵列中添加冗余的即时故障修复策略不同,它具有一组基于冗余的重计算单元(recomputing unit,RCU),可以在稍后的周期中一对一地进行故障单元的重新计算。实验结果表明,与之前的容错方案相比,该文提出的方法显示出更高的故障修复能力和可扩展性,并且芯片面积占用更少。 Due to its high parallelism and simple communication,2D computing arrays in deep learning accelerator(DLA)are often responsible for processing a large number of calculations of convolution.If there is a hardware failure,the calculation error will result in a significant decrease in the prediction accuracy.In order to fix faults in 2D computing arrays,this paper proposes a recomputing architecture(RCA)for fault-tolerant DLA,which is different from the traditional real-time fault repair strategy of adding redundancy in the array.It has a set of redundancy-based recomputing units(RCU)that can be used to recomputing the failure units one-to-one later in the cycle.Experimental results show that,compared with the previous fault-tolerant schemes,the proposed method has higher fault repair capability and scalability,and less chip area occupancy.
作者 王乾龙 许达文 WANG Qianlong;XU Dawen(School of Electronic Science and Applied Physics,Hefei University of Technology,Hefei 230601,China)
出处 《合肥工业大学学报(自然科学版)》 CAS 北大核心 2023年第1期54-59,共6页 Journal of Hefei University of Technology:Natural Science
基金 国家自然科学基金资助项目(61834006)。
关键词 重计算结构(RCA) 深度学习加速器(DLA) 容错 重计算 recomputing architecture(RCA) deep learning accelerator(DLA) fault tolerance recomputing
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部