Quantized neural networks (QNNs), which use low bitwidth numbers for representing parameters and performing computations, have been proposed to reduce the computation complexity, storage size and memory usage. In QNNs...Quantized neural networks (QNNs), which use low bitwidth numbers for representing parameters and performing computations, have been proposed to reduce the computation complexity, storage size and memory usage. In QNNs, parameters and activations are uniformly quantized, such that the multiplications and additions can be accelerated by bitwise operations. However, distributions of parameters in neural networks are often imbalanced, such that the uniform quantization determined from extremal values may underutilize available bitwidth. In this paper, we propose a novel quantization method that can ensure the balance of distributions of quantized values. Our method first recursively partitions the parameters by percentiles into balanced bins, and then applies uniform quantization. We also introduce computationally cheaper approximations of percentiles to reduce the computation overhead introduced. Overall, our method improves the prediction accuracies of QNNs without introducing extra computation during inference, has negligible impact on training speed, and is applicable to both convolutional neural networks and recurrent neural networks. Experiments on standard datasets including ImageNet and Penn Treebank confirm the effectiveness of our method. On ImageNet, the top-5 error rate of our 4-bit quantized GoogLeNet model is 12.7%, which is superior to the state-of-the-arts of QNNs.展开更多
The high performance of the state-of-the-art deep neural networks(DNNs)is acquired at the cost of huge consumption of computing resources.Quantization of networks is recently recognized as a promising solution to solv...The high performance of the state-of-the-art deep neural networks(DNNs)is acquired at the cost of huge consumption of computing resources.Quantization of networks is recently recognized as a promising solution to solve the problem and significantly reduce the resource usage.However,the previous quantization works have mostly focused on the DNN inference,and there were very few works to address on the challenges of DNN training.In this paper,we leverage dynamic fixed-point(DFP)quantization algorithm and stochastic rounding(SR)strategy to develop a fully quantized 8-bit neural networks targeting low bitwidth training.The experiments show that,in comparison to the full-precision networks,the accuracy drop of our quantized convolutional neural networks(CNNs)can be less than 2%,even when applied to deep models evaluated on Image-Net dataset.Additionally,our 8-bit GNMT translation network can achieve almost identical BLEU to full-precision network.We further implement a prototype on FPGA and the synthesis shows that the low bitwidth training scheme can reduce the resource usage significantly.展开更多
To solve the hardware deployment problem caused by the vast demanding computational complexity of convolutional layers and limited hardware resources for the hardware network inference,a look-up table(LUT)-based convo...To solve the hardware deployment problem caused by the vast demanding computational complexity of convolutional layers and limited hardware resources for the hardware network inference,a look-up table(LUT)-based convolution architecture built on a field-programmable gate array using integer multipliers and addition trees is used.With the help of the Winograd algorithm,the optimization of convolution and multiplication is realized to reduce the computational complexity.The LUT-based operator is further optimized to construct a processing unit(PE).Simultaneously optimized storage streams improve memory access efficiency and solve bandwidth constraints.The data toggle rate is reduced to optimize power consumption.The experimental results show that the use of the Winograd algorithm to build basic processing units can significantly reduce the number of multipliers and achieve hardware deployment acceleration,while the time-division multiplexing of processing units improves resource utilization.Under this experimental condition,compared with the traditional convolution method,the architecture optimizes computing resources by 2.25 times and improves the peak throughput by 19.3 times.The LUT-based Winograd accelerator can effectively solve the deployment problem caused by limited hardware resources.展开更多
The paper deals with a new VQ+DPCM+DCT algorithm based on Self-Organizing Feature Maps(SOFM) algorithm for image coding. In addition. a Frequency sensitive SOFM (FSOFM) has been also devel-oped. Simulation results sh...The paper deals with a new VQ+DPCM+DCT algorithm based on Self-Organizing Feature Maps(SOFM) algorithm for image coding. In addition. a Frequency sensitive SOFM (FSOFM) has been also devel-oped. Simulation results show that a very good visual quality of the coded image at 0.252 bits/pixel is obtained.展开更多
Existing algorithms of dish recognition mainly focus on accuracy with predefined classes,thus limiting their application scope.In this paper,we propose a practical two-stage dish recognition framework(DRNet)that yield...Existing algorithms of dish recognition mainly focus on accuracy with predefined classes,thus limiting their application scope.In this paper,we propose a practical two-stage dish recognition framework(DRNet)that yields a tradeoff between speed and accuracy while adapting to the variation in class numbers.In the first stage,we build an arbitrary-oriented dish detector(AODD)to localize dish position,which can effectively alleviate the impact of background noise and pose variations.In the second stage,we propose a dish reidentifier(DReID)to recognize the registered dishes to handle uncertain categories.To further improve the accuracy of DRNet,we design an attribute recognition(AR)module to predict the attributes of dishes.The attributes are used as auxiliary information to enhance the discriminative ability of DRNet.Moreover,pruning and quantization are processed on our model to be deployed in embedded environments.Finally,to facilitate the study of dish recognition,a well-annotated dataset is established.Our AODD,DReID,AR,and DRNet run at about 14,25,16,and 5 fps on the hardware RKNN 3399 pro,respectively.展开更多
文摘Quantized neural networks (QNNs), which use low bitwidth numbers for representing parameters and performing computations, have been proposed to reduce the computation complexity, storage size and memory usage. In QNNs, parameters and activations are uniformly quantized, such that the multiplications and additions can be accelerated by bitwise operations. However, distributions of parameters in neural networks are often imbalanced, such that the uniform quantization determined from extremal values may underutilize available bitwidth. In this paper, we propose a novel quantization method that can ensure the balance of distributions of quantized values. Our method first recursively partitions the parameters by percentiles into balanced bins, and then applies uniform quantization. We also introduce computationally cheaper approximations of percentiles to reduce the computation overhead introduced. Overall, our method improves the prediction accuracies of QNNs without introducing extra computation during inference, has negligible impact on training speed, and is applicable to both convolutional neural networks and recurrent neural networks. Experiments on standard datasets including ImageNet and Penn Treebank confirm the effectiveness of our method. On ImageNet, the top-5 error rate of our 4-bit quantized GoogLeNet model is 12.7%, which is superior to the state-of-the-arts of QNNs.
文摘The high performance of the state-of-the-art deep neural networks(DNNs)is acquired at the cost of huge consumption of computing resources.Quantization of networks is recently recognized as a promising solution to solve the problem and significantly reduce the resource usage.However,the previous quantization works have mostly focused on the DNN inference,and there were very few works to address on the challenges of DNN training.In this paper,we leverage dynamic fixed-point(DFP)quantization algorithm and stochastic rounding(SR)strategy to develop a fully quantized 8-bit neural networks targeting low bitwidth training.The experiments show that,in comparison to the full-precision networks,the accuracy drop of our quantized convolutional neural networks(CNNs)can be less than 2%,even when applied to deep models evaluated on Image-Net dataset.Additionally,our 8-bit GNMT translation network can achieve almost identical BLEU to full-precision network.We further implement a prototype on FPGA and the synthesis shows that the low bitwidth training scheme can reduce the resource usage significantly.
基金The Academic Colleges and Universities Innovation Program 2.0(No.BP0719013)。
文摘To solve the hardware deployment problem caused by the vast demanding computational complexity of convolutional layers and limited hardware resources for the hardware network inference,a look-up table(LUT)-based convolution architecture built on a field-programmable gate array using integer multipliers and addition trees is used.With the help of the Winograd algorithm,the optimization of convolution and multiplication is realized to reduce the computational complexity.The LUT-based operator is further optimized to construct a processing unit(PE).Simultaneously optimized storage streams improve memory access efficiency and solve bandwidth constraints.The data toggle rate is reduced to optimize power consumption.The experimental results show that the use of the Winograd algorithm to build basic processing units can significantly reduce the number of multipliers and achieve hardware deployment acceleration,while the time-division multiplexing of processing units improves resource utilization.Under this experimental condition,compared with the traditional convolution method,the architecture optimizes computing resources by 2.25 times and improves the peak throughput by 19.3 times.The LUT-based Winograd accelerator can effectively solve the deployment problem caused by limited hardware resources.
文摘The paper deals with a new VQ+DPCM+DCT algorithm based on Self-Organizing Feature Maps(SOFM) algorithm for image coding. In addition. a Frequency sensitive SOFM (FSOFM) has been also devel-oped. Simulation results show that a very good visual quality of the coded image at 0.252 bits/pixel is obtained.
基金the National Natural Science Foundation of China(Grant Nos.61972167 and 61802135)the Project of Guangxi Science and Technology(Grant No.GuiKeAD21075030)+3 种基金the Guangxi“Bagui Scholar”Teams for Innovation and Research Projectthe Guangxi Collaborative Innovation Center of Multi-source Information Integration and Intelligent Processingthe Guangxi Talent Highland Project of Big Data Intelligence and Applicationthe Open Project Program of the National Laboratory of Pattern Recognition(NLPR)(Grant No.202000012)。
文摘Existing algorithms of dish recognition mainly focus on accuracy with predefined classes,thus limiting their application scope.In this paper,we propose a practical two-stage dish recognition framework(DRNet)that yields a tradeoff between speed and accuracy while adapting to the variation in class numbers.In the first stage,we build an arbitrary-oriented dish detector(AODD)to localize dish position,which can effectively alleviate the impact of background noise and pose variations.In the second stage,we propose a dish reidentifier(DReID)to recognize the registered dishes to handle uncertain categories.To further improve the accuracy of DRNet,we design an attribute recognition(AR)module to predict the attributes of dishes.The attributes are used as auxiliary information to enhance the discriminative ability of DRNet.Moreover,pruning and quantization are processed on our model to be deployed in embedded environments.Finally,to facilitate the study of dish recognition,a well-annotated dataset is established.Our AODD,DReID,AR,and DRNet run at about 14,25,16,and 5 fps on the hardware RKNN 3399 pro,respectively.