Quantized training has been proven to be a prominent method to achieve deep neural network training under limited computational resources.It uses low bit-width arithmetics with a proper scaling factor to achieve negli...Quantized training has been proven to be a prominent method to achieve deep neural network training under limited computational resources.It uses low bit-width arithmetics with a proper scaling factor to achieve negligible accuracy loss.Cambricon-Q is the ASIC design proposed to efficiently support quantized training,and achieves significant performance improvement.However,there are still two caveats in the design.First,Cambricon-Q with different hardware specifications may lead to different numerical errors,resulting in non-reproducible behaviors which may become a major concern in critical applications.Second,Cambricon-Q cannot leverage data sparsity,where considerable cycles could still be squeezed out.To address the caveats,the acceleration core of Cambricon-Q is redesigned to support fine-grained irregular data processing.The new design not only enables acceleration on sparse data,but also enables performing local dynamic quantization by contiguous value ranges(which is hardware independent),instead of contiguous addresses(which is dependent on hardware factors).Experimental results show that the accuracy loss of the method still keeps negligible,and the accelerator achieves 1.61×performance improvement over Cambricon-Q,with about 10%energy increase.展开更多
Deep neural networks(DNNs)have drawn great attention as they perform the state-of-the-art results on many tasks.Compared to DNNs,spiking neural networks(SNNs),which are considered as the new generation of neural netwo...Deep neural networks(DNNs)have drawn great attention as they perform the state-of-the-art results on many tasks.Compared to DNNs,spiking neural networks(SNNs),which are considered as the new generation of neural networks,fail to achieve comparable performance especially on tasks with large problem sizes.Many previous work tried to close the gap between DNNs and SNNs but used small networks on simple tasks.This work proposes a simple but effective way to construct deep spiking neural networks(DSNNs)by transferring the learned ability of DNNs to SNNs.DSNNs achieve comparable accuracy on large networks and complex datasets.展开更多
Deep learning accelerators(DLAs)have been proved to be efficient computational devices for processing deep learning algorithms.Various DLA architectures are proposed and applied to different applications and tasks.How...Deep learning accelerators(DLAs)have been proved to be efficient computational devices for processing deep learning algorithms.Various DLA architectures are proposed and applied to different applications and tasks.However,for most DLAs,their programming interfaces are either difficult to use or not efficient enough.Most DLAs require programmers to directly write instructions,which is time-consuming and error-prone.Another prevailing programming interface for DLAs is high-performance libraries and deep learning frameworks,which are easy to be used and very friendly to users,but their high abstraction level limits their control capacity over the hardware resources thus compromises the efficiency of the accelerator.A design of the programming interface is for DLAs.First various existing DLAs and their programming methods are analyzed and a methodology for designing programming interface for DLAs is proposed,which is a high-level assembly language(called DLA-AL),assembler and runtime for DLAs.DLA-AL is composed of a low-level assembly language and a set of high-level blocks.It allows experienced experts to fully exploit the potential of DLAs and achieve near-optimal performance.Meanwhile,by using DLA-AL,end-users who have little knowledge of the hardware are able to develop deep learning algorithms on DLAs spending minimal programming efforts.展开更多
基金the National Key Research and Devecopment Program of China(No.2022YFB4501601)the National Natural Science Foundation of China(No.62102398,U20A20227,62222214,62002338,U22A2028,U19B2019)+1 种基金the Chinese Academy of Sciences Project for Young Scientists in Basic Research(YSBR-029)Youth Innovation Promotion Association Chinese Academy of Sciences。
文摘Quantized training has been proven to be a prominent method to achieve deep neural network training under limited computational resources.It uses low bit-width arithmetics with a proper scaling factor to achieve negligible accuracy loss.Cambricon-Q is the ASIC design proposed to efficiently support quantized training,and achieves significant performance improvement.However,there are still two caveats in the design.First,Cambricon-Q with different hardware specifications may lead to different numerical errors,resulting in non-reproducible behaviors which may become a major concern in critical applications.Second,Cambricon-Q cannot leverage data sparsity,where considerable cycles could still be squeezed out.To address the caveats,the acceleration core of Cambricon-Q is redesigned to support fine-grained irregular data processing.The new design not only enables acceleration on sparse data,but also enables performing local dynamic quantization by contiguous value ranges(which is hardware independent),instead of contiguous addresses(which is dependent on hardware factors).Experimental results show that the accuracy loss of the method still keeps negligible,and the accelerator achieves 1.61×performance improvement over Cambricon-Q,with about 10%energy increase.
基金the National Natural Science Foundation of China(No.61732007)Strategic Priority Research Program of Chinese Academy of Sciences(XDB32050200,XDC01020000).
文摘Deep neural networks(DNNs)have drawn great attention as they perform the state-of-the-art results on many tasks.Compared to DNNs,spiking neural networks(SNNs),which are considered as the new generation of neural networks,fail to achieve comparable performance especially on tasks with large problem sizes.Many previous work tried to close the gap between DNNs and SNNs but used small networks on simple tasks.This work proposes a simple but effective way to construct deep spiking neural networks(DSNNs)by transferring the learned ability of DNNs to SNNs.DSNNs achieve comparable accuracy on large networks and complex datasets.
基金Supported by the National Key Research and Development Program of China(No.2017YFA0700902,2017YFB1003101)the 973 Program of China(No.2015CB358800)National Science and Technology Major Project(No.2018ZX01031102)
文摘Deep learning accelerators(DLAs)have been proved to be efficient computational devices for processing deep learning algorithms.Various DLA architectures are proposed and applied to different applications and tasks.However,for most DLAs,their programming interfaces are either difficult to use or not efficient enough.Most DLAs require programmers to directly write instructions,which is time-consuming and error-prone.Another prevailing programming interface for DLAs is high-performance libraries and deep learning frameworks,which are easy to be used and very friendly to users,but their high abstraction level limits their control capacity over the hardware resources thus compromises the efficiency of the accelerator.A design of the programming interface is for DLAs.First various existing DLAs and their programming methods are analyzed and a methodology for designing programming interface for DLAs is proposed,which is a high-level assembly language(called DLA-AL),assembler and runtime for DLAs.DLA-AL is composed of a low-level assembly language and a set of high-level blocks.It allows experienced experts to fully exploit the potential of DLAs and achieve near-optimal performance.Meanwhile,by using DLA-AL,end-users who have little knowledge of the hardware are able to develop deep learning algorithms on DLAs spending minimal programming efforts.