The quantization algorithm compresses the original network by reducing the numerical bit width of the model,which improves the computation speed. Because different layers have different redundancy and sensitivity to d...The quantization algorithm compresses the original network by reducing the numerical bit width of the model,which improves the computation speed. Because different layers have different redundancy and sensitivity to databit width. Reducing the data bit width will result in a loss of accuracy. Therefore, it is difficult to determinethe optimal bit width for different parts of the network with guaranteed accuracy. Mixed precision quantizationcan effectively reduce the amount of computation while keeping the model accuracy basically unchanged. In thispaper, a hardware-aware mixed precision quantization strategy optimal assignment algorithm adapted to low bitwidth is proposed, and reinforcement learning is used to automatically predict the mixed precision that meets theconstraints of hardware resources. In the state-space design, the standard deviation of weights is used to measurethe distribution difference of data, the execution speed feedback of simulated neural network accelerator inferenceis used as the environment to limit the action space of the agent, and the accuracy of the quantization model afterretraining is used as the reward function to guide the agent to carry out deep reinforcement learning training. Theexperimental results show that the proposed method obtains a suitable model layer-by-layer quantization strategyunder the condition that the computational resources are satisfied, and themodel accuracy is effectively improved.The proposed method has strong intelligence and certain universality and has strong application potential in thefield of mixed precision quantization and embedded neural network model deployment.展开更多
In this paper, the optimization of quantizer’s segment threshold is done. The quantizer is designed on the basis of approximative spline functions. Coefficients on which we form approximative spline functions are cal...In this paper, the optimization of quantizer’s segment threshold is done. The quantizer is designed on the basis of approximative spline functions. Coefficients on which we form approximative spline functions are calculated by minimization mean square error (MSE). For coefficients determined in this way, spline functions by which optimal compressor function is approximated are obtained. For the quantizer designed on the basis of approximative spline functions, segment threshold is numerically determined depending on maximal value of the signal to quantization noise ratio (SQNR). Thus, quantizer with optimized segment threshold is achieved. It is shown that by quantizer model designed in this way and proposed in this paper, the SQNR that is very close to SQNR of nonlinear optimal companding quantizer is achieved.展开更多
Training a machine learning model with federated edge learning(FEEL)is typically time consuming due to the constrained computation power of edge devices and the limited wireless resources in edge networks.In this stud...Training a machine learning model with federated edge learning(FEEL)is typically time consuming due to the constrained computation power of edge devices and the limited wireless resources in edge networks.In this study,the training time minimization problem is investigated in a quantized FEEL system,where heterogeneous edge devices send quantized gradients to the edge server via orthogonal channels.In particular,a stochastic quantization scheme is adopted for compression of uploaded gradients,which can reduce the burden of per-round communication but may come at the cost of increasing the number of communication rounds.The training time is modeled by taking into account the communication time,computation time,and the number of communication rounds.Based on the proposed training time model,the intrinsic trade-off between the number of communication rounds and per-round latency is characterized.Specifically,we analyze the convergence behavior of the quantized FEEL in terms of the optimality gap.Furthermore,a joint data-and-model-driven fitting method is proposed to obtain the exact optimality gap,based on which the closed-form expressions for the number of communication rounds and the total training time are obtained.Constrained by the total bandwidth,the training time minimization problem is formulated as a joint quantization level and bandwidth allocation optimization problem.To this end,an algorithm based on alternating optimization is proposed,which alternatively solves the subproblem of quantization optimization through successive convex approximation and the subproblem of bandwidth allocation by bisection search.With different learning tasks and models,the validation of our analysis and the near-optimal performance of the proposed optimization algorithm are demonstrated by the simulation results.展开更多
In a variety of modern applications there arises a need to tessellate the domain into representative regions,called Voronoi cells.A particular type of such tessellations,called centroidal Voronoi tessellations or CVTs...In a variety of modern applications there arises a need to tessellate the domain into representative regions,called Voronoi cells.A particular type of such tessellations,called centroidal Voronoi tessellations or CVTs,are in big demand due to their optimality properties important for many applications.The availability of fast and reliable algorithms for their construction is crucial for their successful use in practical settings.This paper introduces a new multigrid algorithm for constructing CVTs that is based on the MG/Opt algorithm that was originally designed to solve large nonlinear optimization problems.Uniform convergence of the new method and its speedup comparing to existing techniques are demonstrated for linear and nonlinear densities for several 1d and 2d problems,and O(k)complexity estimation is provided for a problem with k generators.展开更多
文摘The quantization algorithm compresses the original network by reducing the numerical bit width of the model,which improves the computation speed. Because different layers have different redundancy and sensitivity to databit width. Reducing the data bit width will result in a loss of accuracy. Therefore, it is difficult to determinethe optimal bit width for different parts of the network with guaranteed accuracy. Mixed precision quantizationcan effectively reduce the amount of computation while keeping the model accuracy basically unchanged. In thispaper, a hardware-aware mixed precision quantization strategy optimal assignment algorithm adapted to low bitwidth is proposed, and reinforcement learning is used to automatically predict the mixed precision that meets theconstraints of hardware resources. In the state-space design, the standard deviation of weights is used to measurethe distribution difference of data, the execution speed feedback of simulated neural network accelerator inferenceis used as the environment to limit the action space of the agent, and the accuracy of the quantization model afterretraining is used as the reward function to guide the agent to carry out deep reinforcement learning training. Theexperimental results show that the proposed method obtains a suitable model layer-by-layer quantization strategyunder the condition that the computational resources are satisfied, and themodel accuracy is effectively improved.The proposed method has strong intelligence and certain universality and has strong application potential in thefield of mixed precision quantization and embedded neural network model deployment.
基金Serbian Ministry of Education and Science through Mathematical Institute of Serbian Academy of Sciences and Arts(Project III44006)Serbian Ministry of Education and Science(Project TR32035)
文摘In this paper, the optimization of quantizer’s segment threshold is done. The quantizer is designed on the basis of approximative spline functions. Coefficients on which we form approximative spline functions are calculated by minimization mean square error (MSE). For coefficients determined in this way, spline functions by which optimal compressor function is approximated are obtained. For the quantizer designed on the basis of approximative spline functions, segment threshold is numerically determined depending on maximal value of the signal to quantization noise ratio (SQNR). Thus, quantizer with optimized segment threshold is achieved. It is shown that by quantizer model designed in this way and proposed in this paper, the SQNR that is very close to SQNR of nonlinear optimal companding quantizer is achieved.
基金supported by the National Key R&D Program of China(No.2020YFB1807100)the National Natural Science Foundation of China(No.62001310)the Guangdong Basic and Applied Basic Research Foundation,China(No.2022A1515010109)。
文摘Training a machine learning model with federated edge learning(FEEL)is typically time consuming due to the constrained computation power of edge devices and the limited wireless resources in edge networks.In this study,the training time minimization problem is investigated in a quantized FEEL system,where heterogeneous edge devices send quantized gradients to the edge server via orthogonal channels.In particular,a stochastic quantization scheme is adopted for compression of uploaded gradients,which can reduce the burden of per-round communication but may come at the cost of increasing the number of communication rounds.The training time is modeled by taking into account the communication time,computation time,and the number of communication rounds.Based on the proposed training time model,the intrinsic trade-off between the number of communication rounds and per-round latency is characterized.Specifically,we analyze the convergence behavior of the quantized FEEL in terms of the optimality gap.Furthermore,a joint data-and-model-driven fitting method is proposed to obtain the exact optimality gap,based on which the closed-form expressions for the number of communication rounds and the total training time are obtained.Constrained by the total bandwidth,the training time minimization problem is formulated as a joint quantization level and bandwidth allocation optimization problem.To this end,an algorithm based on alternating optimization is proposed,which alternatively solves the subproblem of quantization optimization through successive convex approximation and the subproblem of bandwidth allocation by bisection search.With different learning tasks and models,the validation of our analysis and the near-optimal performance of the proposed optimization algorithm are demonstrated by the simulation results.
基金supported by the U.S.Department of Energy under Award DE-SC-0001691support from the ORAU Ralph E.Powe Junior Faculty Enhancement Award and from the National Science Foundation under the grants DMS-1056821 and DMS-0915013.
文摘In a variety of modern applications there arises a need to tessellate the domain into representative regions,called Voronoi cells.A particular type of such tessellations,called centroidal Voronoi tessellations or CVTs,are in big demand due to their optimality properties important for many applications.The availability of fast and reliable algorithms for their construction is crucial for their successful use in practical settings.This paper introduces a new multigrid algorithm for constructing CVTs that is based on the MG/Opt algorithm that was originally designed to solve large nonlinear optimization problems.Uniform convergence of the new method and its speedup comparing to existing techniques are demonstrated for linear and nonlinear densities for several 1d and 2d problems,and O(k)complexity estimation is provided for a problem with k generators.