Quantized training has been proven to be a prominent method to achieve deep neural network training under limited computational resources.It uses low bit-width arithmetics with a proper scaling factor to achieve negli...Quantized training has been proven to be a prominent method to achieve deep neural network training under limited computational resources.It uses low bit-width arithmetics with a proper scaling factor to achieve negligible accuracy loss.Cambricon-Q is the ASIC design proposed to efficiently support quantized training,and achieves significant performance improvement.However,there are still two caveats in the design.First,Cambricon-Q with different hardware specifications may lead to different numerical errors,resulting in non-reproducible behaviors which may become a major concern in critical applications.Second,Cambricon-Q cannot leverage data sparsity,where considerable cycles could still be squeezed out.To address the caveats,the acceleration core of Cambricon-Q is redesigned to support fine-grained irregular data processing.The new design not only enables acceleration on sparse data,but also enables performing local dynamic quantization by contiguous value ranges(which is hardware independent),instead of contiguous addresses(which is dependent on hardware factors).Experimental results show that the accuracy loss of the method still keeps negligible,and the accelerator achieves 1.61×performance improvement over Cambricon-Q,with about 10%energy increase.展开更多
Deep neural networks(DNNs)have drawn great attention as they perform the state-of-the-art results on many tasks.Compared to DNNs,spiking neural networks(SNNs),which are considered as the new generation of neural netwo...Deep neural networks(DNNs)have drawn great attention as they perform the state-of-the-art results on many tasks.Compared to DNNs,spiking neural networks(SNNs),which are considered as the new generation of neural networks,fail to achieve comparable performance especially on tasks with large problem sizes.Many previous work tried to close the gap between DNNs and SNNs but used small networks on simple tasks.This work proposes a simple but effective way to construct deep spiking neural networks(DSNNs)by transferring the learned ability of DNNs to SNNs.DSNNs achieve comparable accuracy on large networks and complex datasets.展开更多
Recent years,the deep learning algorithm has been widely deployed from cloud servers to terminal units.And researchers proposed various neural network accelerators and software development environments.In this article...Recent years,the deep learning algorithm has been widely deployed from cloud servers to terminal units.And researchers proposed various neural network accelerators and software development environments.In this article,we have reviewed the representative neural network accelerators.As an entirety,the corresponding software stack must consider the hardware architecture of the specific accelerator to enhance the end-to-end performance.And we summarize the programming environments of neural network accelerators and optimizations in software stack.Finally,we comment the future trend of neural network accelerator and programming environments.展开更多
基金the National Key Research and Devecopment Program of China(No.2022YFB4501601)the National Natural Science Foundation of China(No.62102398,U20A20227,62222214,62002338,U22A2028,U19B2019)+1 种基金the Chinese Academy of Sciences Project for Young Scientists in Basic Research(YSBR-029)Youth Innovation Promotion Association Chinese Academy of Sciences。
文摘Quantized training has been proven to be a prominent method to achieve deep neural network training under limited computational resources.It uses low bit-width arithmetics with a proper scaling factor to achieve negligible accuracy loss.Cambricon-Q is the ASIC design proposed to efficiently support quantized training,and achieves significant performance improvement.However,there are still two caveats in the design.First,Cambricon-Q with different hardware specifications may lead to different numerical errors,resulting in non-reproducible behaviors which may become a major concern in critical applications.Second,Cambricon-Q cannot leverage data sparsity,where considerable cycles could still be squeezed out.To address the caveats,the acceleration core of Cambricon-Q is redesigned to support fine-grained irregular data processing.The new design not only enables acceleration on sparse data,but also enables performing local dynamic quantization by contiguous value ranges(which is hardware independent),instead of contiguous addresses(which is dependent on hardware factors).Experimental results show that the accuracy loss of the method still keeps negligible,and the accelerator achieves 1.61×performance improvement over Cambricon-Q,with about 10%energy increase.
基金the National Natural Science Foundation of China(No.61732007)Strategic Priority Research Program of Chinese Academy of Sciences(XDB32050200,XDC01020000).
文摘Deep neural networks(DNNs)have drawn great attention as they perform the state-of-the-art results on many tasks.Compared to DNNs,spiking neural networks(SNNs),which are considered as the new generation of neural networks,fail to achieve comparable performance especially on tasks with large problem sizes.Many previous work tried to close the gap between DNNs and SNNs but used small networks on simple tasks.This work proposes a simple but effective way to construct deep spiking neural networks(DSNNs)by transferring the learned ability of DNNs to SNNs.DSNNs achieve comparable accuracy on large networks and complex datasets.
基金partially supported by the National Key Research and Development Program of China (under Grant 2017YFB1003101, 2018AAA0103300, 2017YFA0700900, 2017YFA0700902, 2017YFA0700901)the National Natural Science Foundation of China (under Grant 61732007, 61432016, 61532016, 61672491, 61602441, 61602446, 61732002, 61702478, and 61732020)+6 种基金Beijing Natural Science Foundation (JQ18013)National Science and Technology Major Project (2018ZX01031102)the Transformation and Transferof Scientific and Technological Achievements of Chinese Academy of Sciences (KFJ-HGZX-013)Key Research Projects in Frontier Science of Chinese Academy of Sciences (QYZDBSSW-JSC001)Strategic Priority Research Program of Chinese Academy of Science (XDB32050200, XDC01020000)Standardization Research Project of Chinese Academy of Sciences (BZ201800001)Beijing Academy of Artificial Intelligence (BAAI) and Beijing Nova Program of Science and Technology (Z191100001119093)
文摘Recent years,the deep learning algorithm has been widely deployed from cloud servers to terminal units.And researchers proposed various neural network accelerators and software development environments.In this article,we have reviewed the representative neural network accelerators.As an entirety,the corresponding software stack must consider the hardware architecture of the specific accelerator to enhance the end-to-end performance.And we summarize the programming environments of neural network accelerators and optimizations in software stack.Finally,we comment the future trend of neural network accelerator and programming environments.