期刊文献+
共找到5篇文章
< 1 >
每页显示 20 50 100
Balanced Quantization: An Effective and Efficient Approach toQuantized Neural Networks 被引量:4
1
作者 Shu-Chang Zhou Yu-Zhi Wang +2 位作者 He Wen Qin-Yao He Yu-Heng Zou 《Journal of Computer Science & Technology》 SCIE EI CSCD 2017年第4期667-682,共16页
Quantized neural networks (QNNs), which use low bitwidth numbers for representing parameters and performing computations, have been proposed to reduce the computation complexity, storage size and memory usage. In QNNs... Quantized neural networks (QNNs), which use low bitwidth numbers for representing parameters and performing computations, have been proposed to reduce the computation complexity, storage size and memory usage. In QNNs, parameters and activations are uniformly quantized, such that the multiplications and additions can be accelerated by bitwise operations. However, distributions of parameters in neural networks are often imbalanced, such that the uniform quantization determined from extremal values may underutilize available bitwidth. In this paper, we propose a novel quantization method that can ensure the balance of distributions of quantized values. Our method first recursively partitions the parameters by percentiles into balanced bins, and then applies uniform quantization. We also introduce computationally cheaper approximations of percentiles to reduce the computation overhead introduced. Overall, our method improves the prediction accuracies of QNNs without introducing extra computation during inference, has negligible impact on training speed, and is applicable to both convolutional neural networks and recurrent neural networks. Experiments on standard datasets including ImageNet and Penn Treebank confirm the effectiveness of our method. On ImageNet, the top-5 error rate of our 4-bit quantized GoogLeNet model is 12.7%, which is superior to the state-of-the-arts of QNNs. 展开更多
关键词 quantized neural network percentile histogram equalization uniform quantization
原文传递
Towards high performance low bitwidth training for deep neural networks
2
作者 Chunyou Su Sheng Zhou +1 位作者 Liang Feng Wei Zhang 《Journal of Semiconductors》 EI CAS CSCD 2020年第2期63-72,共10页
The high performance of the state-of-the-art deep neural networks(DNNs)is acquired at the cost of huge consumption of computing resources.Quantization of networks is recently recognized as a promising solution to solv... The high performance of the state-of-the-art deep neural networks(DNNs)is acquired at the cost of huge consumption of computing resources.Quantization of networks is recently recognized as a promising solution to solve the problem and significantly reduce the resource usage.However,the previous quantization works have mostly focused on the DNN inference,and there were very few works to address on the challenges of DNN training.In this paper,we leverage dynamic fixed-point(DFP)quantization algorithm and stochastic rounding(SR)strategy to develop a fully quantized 8-bit neural networks targeting low bitwidth training.The experiments show that,in comparison to the full-precision networks,the accuracy drop of our quantized convolutional neural networks(CNNs)can be less than 2%,even when applied to deep models evaluated on Image-Net dataset.Additionally,our 8-bit GNMT translation network can achieve almost identical BLEU to full-precision network.We further implement a prototype on FPGA and the synthesis shows that the low bitwidth training scheme can reduce the resource usage significantly. 展开更多
关键词 CNN quantized neural networks limited precision training
下载PDF
WinoNet:Reconfigurable look-up table-based Winograd accelerator for arbitrary precision convolutional neural network inference
3
作者 Wang Chengcheng Li He +3 位作者 Cao Yanpeng Song Changjun Yu Feng Tang Yongming 《Journal of Southeast University(English Edition)》 EI CAS 2022年第4期332-339,共8页
To solve the hardware deployment problem caused by the vast demanding computational complexity of convolutional layers and limited hardware resources for the hardware network inference,a look-up table(LUT)-based convo... To solve the hardware deployment problem caused by the vast demanding computational complexity of convolutional layers and limited hardware resources for the hardware network inference,a look-up table(LUT)-based convolution architecture built on a field-programmable gate array using integer multipliers and addition trees is used.With the help of the Winograd algorithm,the optimization of convolution and multiplication is realized to reduce the computational complexity.The LUT-based operator is further optimized to construct a processing unit(PE).Simultaneously optimized storage streams improve memory access efficiency and solve bandwidth constraints.The data toggle rate is reduced to optimize power consumption.The experimental results show that the use of the Winograd algorithm to build basic processing units can significantly reduce the number of multipliers and achieve hardware deployment acceleration,while the time-division multiplexing of processing units improves resource utilization.Under this experimental condition,compared with the traditional convolution method,the architecture optimizes computing resources by 2.25 times and improves the peak throughput by 19.3 times.The LUT-based Winograd accelerator can effectively solve the deployment problem caused by limited hardware resources. 展开更多
关键词 quantized neural networks look-up table(LUT)-based multiplier Winograd algorithm arbitrary precision
下载PDF
A New Image Coding Algorithm Based on Self-Organizing Neural Network 被引量:1
4
作者 LiHongsong QuanZiyi 《The Journal of China Universities of Posts and Telecommunications》 EI CSCD 1995年第1期40-43,共4页
The paper deals with a new VQ+DPCM+DCT algorithm based on Self-Organizing Feature Maps(SOFM) algorithm for image coding. In addition. a Frequency sensitive SOFM (FSOFM) has been also devel-oped. Simulation results sh... The paper deals with a new VQ+DPCM+DCT algorithm based on Self-Organizing Feature Maps(SOFM) algorithm for image coding. In addition. a Frequency sensitive SOFM (FSOFM) has been also devel-oped. Simulation results show that a very good visual quality of the coded image at 0.252 bits/pixel is obtained. 展开更多
关键词 image coding vector quantization (VQ) self-organizing neural network
原文传递
DRNet:Towards fast,accurate and practical dish recognition 被引量:1
5
作者 CHENG SiYuan CHU BinFei +4 位作者 ZHONG BiNeng ZHANG ZiKai LIU Xin TANG ZhenJun LI XianXian 《Science China(Technological Sciences)》 SCIE EI CAS CSCD 2021年第12期2651-2661,共11页
Existing algorithms of dish recognition mainly focus on accuracy with predefined classes,thus limiting their application scope.In this paper,we propose a practical two-stage dish recognition framework(DRNet)that yield... Existing algorithms of dish recognition mainly focus on accuracy with predefined classes,thus limiting their application scope.In this paper,we propose a practical two-stage dish recognition framework(DRNet)that yields a tradeoff between speed and accuracy while adapting to the variation in class numbers.In the first stage,we build an arbitrary-oriented dish detector(AODD)to localize dish position,which can effectively alleviate the impact of background noise and pose variations.In the second stage,we propose a dish reidentifier(DReID)to recognize the registered dishes to handle uncertain categories.To further improve the accuracy of DRNet,we design an attribute recognition(AR)module to predict the attributes of dishes.The attributes are used as auxiliary information to enhance the discriminative ability of DRNet.Moreover,pruning and quantization are processed on our model to be deployed in embedded environments.Finally,to facilitate the study of dish recognition,a well-annotated dataset is established.Our AODD,DReID,AR,and DRNet run at about 14,25,16,and 5 fps on the hardware RKNN 3399 pro,respectively. 展开更多
关键词 neural network acceleration neural network quantization object detection reidentification dish recognition
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部