期刊文献+

基于客户端-服务器的容错神经网络训练架构 被引量:1

Fault-tolerant neural network training framework based on client-server
下载PDF
导出
摘要 为了实现低功耗和实时推理,AIoT设备近年来被应用于深度学习中的多个领域.然而,一些制造工艺导致AIoT设备在推理时会出现软错误.对于具有大量计算的神经网络加速器来说,可能会导致大量的计算误差和巨大的预测精度损失,这对于像自主无人机这样精度敏感的应用来说是无法忍受的.而传统的容错技术(如三重模块化冗余)会带来相当大的功耗和性能损失.本文提出了一种客户端-服务器协同的容错神经网络训练框架.在训练中采用带有软错误的AIoT处理器作为客户端,然后服务器端通过AIoT设备的应用数据学习到计算错误.实验中选取了多个具有代表性的神经网络模型.相比于离线训练的模型,该方法训练的模型使神经网络的top5精度平均提高2.8%. In order to realize low power consumption and real-time inference,AIoT devices have been applied in many fields of deep learning in recent years.However,some manufacturing processes cause some soft errors on AIOT devices in inference.For a neural network accelerator with a large amount of computation,it may lead to a large amount of computing error and a huge loss of prediction accuracy,which is intolerable for precision-sensitive applications such as autonomous drones.However,conventional fault tolerance techniques such as triple modular redundancy can incur considerable power consumption and performance penalty.In this paper,a client-server collaborative fault-tolerant neural network training framework is proposed.In the training,an AIoT processor with soft errors is used as the client,and the server learns the on-site computing errors with the application data of AIoT processor.Several representative neural network models were selected in the experiment.Compared with the off-line training model,the model trained by this method increases the top5 accuracy of the neural network by an average of 2.8%.
作者 何梦 许达文 He Meng;Xu Dawen(School of Electronic Science&Applied Physics,Hefei University of Technology,Hefei 23009,Anhui,China)
出处 《微电子学与计算机》 2021年第10期73-78,共6页 Microelectronics & Computer
基金 国家自然科学基金面上项目(61874124)。
关键词 AIoT设备 神经网络加速器 容错 协同训练 AIoT devices neural network accelerator fault tolerance collaborative training
  • 相关文献

同被引文献2

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部