Approximate Computing is a low power achieving technique that offers an additional degree of freedom to design digital circuits.Pruning is one of the types of approximate circuit design technique which removes logic g...Approximate Computing is a low power achieving technique that offers an additional degree of freedom to design digital circuits.Pruning is one of the types of approximate circuit design technique which removes logic gates or wires in the circuit to reduce power consumption with minimal insertion of error.In this work,a novel machine learning(ML)-based pruning technique is introduced to design digital circuits.The machine-learning algorithm of the random forest deci-sion tree is used to prune nodes selectively based on their input pattern.In addi-tion,an error compensation value is added to the original output to reduce an error rate.Experimental results proved the efficiency of the proposed technique in terms of area,power and error rate.Compared to conventional pruning,proposed ML pruning achieves 32%and 26%of the area and delay reductions in 8*8 multi-plier implementation.Low power image processing algorithms are essential in various applications like image compression and enhancement algorithms.For real-time evaluation,proposed ML optimized pruning is applied in discrete cosine transform(DCT).It is a basic element of image and video processing applications.Experimental results on benchmark images show that proposed pruning achieves a very good peak signal-to-noise ratio(PSNR)value with a considerable amount of energy savings compared to other methods.展开更多
In recent years,Approximate Computing Circuits(ACCs)have been widely used in applications with intrinsic tolerance to errors.With the increased availability of approximate computing circuit approaches,reliability anal...In recent years,Approximate Computing Circuits(ACCs)have been widely used in applications with intrinsic tolerance to errors.With the increased availability of approximate computing circuit approaches,reliability analysis methods for assessing their fault vulnerability have become highly necessary.In this study,two accurate reliability evaluation methods for approximate computing circuits are proposed.The reliability of approximate computing circuits is calculated on the basis of the iterative Probabilistic Transfer Matrix(PTM)model.During the calculation,the correlation coefficients are derived and combined to deal with the correlation problem caused by fanout reconvergence.The accuracy and scalability of the two methods are verified using three sets of approximate computing circuit instances and more circuits in Evo Approx8 b,which is an approximate computing circuit open source library.Experimental results show that relative to the Monte Carlo simulation,the two methods achieve average error rates of 0.46%and 1.29%and time overheads of 0.002%and 0.1%.Different from the existing approaches to reliability estimation for approximate computing circuits based on the original PTM model,the proposed methods reduce the space overheads by nearly 50%and achieve time overheads of 1.78%and 2.19%.展开更多
Realizing a high-performance and energy-efficient circuit system is one of the critical tasks for circuit designers.Conventional researchers always concentrated on the tradeoffs between the energy and the performance ...Realizing a high-performance and energy-efficient circuit system is one of the critical tasks for circuit designers.Conventional researchers always concentrated on the tradeoffs between the energy and the performance in circuit and system design based on accurate computing.However,as video/image processing and machine learning algorithms are widespread,the technique of approximate computing in these applications has become a hot topic.The errors caused by approximate computing could be tolerated by these applications with specific processing or algorithms,and large improvements in performance or power savings could be achieved with some acceptable loss in final output quality.This paper presents a survey of approximate computing from arithmetic units design to high-level applications,in which we try to give researchers a comprehensive and insightful understanding of approximate computing.We believe that approximate computing will play an important role in the circuit and system design in the future,especially with the rapid development of artificial intelligence algorithms and their related applications.展开更多
In edge computing,a reasonable edge resource bidding mechanism can enable edge providers and users to obtain benefits in a relatively fair fashion.To maximize such benefits,this paper proposes a dynamic multiattribute...In edge computing,a reasonable edge resource bidding mechanism can enable edge providers and users to obtain benefits in a relatively fair fashion.To maximize such benefits,this paper proposes a dynamic multiattribute resource bidding mechanism(DMRBM).Most of the previous work mainly relies on a third-party agent to exchange information to gain optimal benefits.It isworth noting thatwhen edge providers and users trade with thirdparty agents which are not entirely reliable and trustworthy,their sensitive information is prone to be leaked.Moreover,the privacy protection of edge providers and users must be considered in the dynamic pricing/transaction process,which is also very challenging.Therefore,this paper first adopts a privacy protection algorithm to prevent sensitive information from leakage.On the premise that the sensitive data of both edge providers and users are protected,the prices of providers fluctuate within a certain range.Then,users can choose appropriate edge providers by the price-performance ratio(PPR)standard and the reward of lower price(LPR)standard according to their demands.The two standards can be evolved by two evaluation functions.Furthermore,this paper employs an approximate computing method to get an approximate solution of DMRBM in polynomial time.Specifically,this paper models the bidding process as a non-cooperative game and obtains the approximate optimal solution based on two standards according to the game theory.Through the extensive experiments,this paper demonstrates that the DMRBM satisfies the individual rationality,budget balance,and privacy protection and it can also increase the task offloading rate and the system benefits.展开更多
This paper proposes a hardware-efficient implementation of division, which is useful for image processing in WSN edge devices. For error-resilient applications such as image processing, accurate calculations can be un...This paper proposes a hardware-efficient implementation of division, which is useful for image processing in WSN edge devices. For error-resilient applications such as image processing, accurate calculations can be unnecessary overhead, and approximate computing that obtains circuit benefits from inaccurate calculations is effective. Since there are studies showing sufficient performance with few bit operations, this paper proposes a combinational arithmetic circuit design of 16 bits or less. The proposed design is an approximate restoring division circuit implemented with a 2-dimensional array of 1-bit subtractor cells. The main drawback of such a design is the long “borrow-chain” that traverses all of the rows of the 2-dimensional subtractor array before a final stable quotient result can be produced, thereby resulting in a long delay and excessive power dissipation. This paper proposes two approximate subtractor cell designs, named ABSC and ADSC, that break this borrow chain: the first in the vertical direction and the second in the horizontal direction, respectively. The proposed approximate divider designs are compared with an accurate design and previous state-of-the-art designs based on accuracy and hardware overhead. The proposed designs have accuracy levels that are close to the best accuracy levels achieved by previous state-of-the-art approximate divider designs. In addition, the proposed ADSC design had the lowest delay, area, and power characteristics. Finally, the implementation of both proposed designs for two practical applications showed that both designs provide sufficient division accuracy.展开更多
The on line computational burden related to model predictive control (MPC) of large scale constrained systems hampers its real time applications and limits it to slow dynamic process with moderate number of inputs....The on line computational burden related to model predictive control (MPC) of large scale constrained systems hampers its real time applications and limits it to slow dynamic process with moderate number of inputs. To avoid this, an efficient and fast algorithm based on aggregation optimization is proposed in this paper. It only optimizes the current control action at time instant k , while other future control sequences in the optimization horizon are approximated off line by the linear feedback control sequence, so the on line optimization can be converted into a low dimensional quadratic programming problem. Input constraints can be well handled in this scheme. The comparable performance is achieved with existing standard model predictive control algorithm. Simulation results well demonstrate its effectiveness.展开更多
The demise of Dennard’s scaling has created both power and utilization wall challenges for computer systems.As transistors operating in the near-threshold region are able to obtain flexible trade-offs between power a...The demise of Dennard’s scaling has created both power and utilization wall challenges for computer systems.As transistors operating in the near-threshold region are able to obtain flexible trade-offs between power and performance,it is regarded as an alternative solution to the scaling challenge.A reduction in supply voltage will nevertheless generate significant reliability challenges,while maintaining an error-free system that generates high costs in both performance and energy consumption.The main purpose of research on computer architecture has therefore shifted from performance improvement to complex multi-objective optimization.In this paper,we propose a three-dimensional optimization approach which can effectively identify the best system configuration to establish a balance among performance,energy,and reliability.We use a dynamic programming algorithm to determine the proper voltage and approximate level based on three predictors:system performance,energy consumption,and output quality.We propose an output quality predictor which uses a hardware/software co-design fault injection platform to evaluate the impact of the error on output quality under near-threshold computing(NTC).Evaluation results demonstrate that our approach can lead to a 28% improvement in output quality with a 10% drop in overall energy efficiency;this translates to an approximately 20% average improvement in accuracy,power,and performance.展开更多
Abstract:Approximate computing has received significant attention in the design of portable CMOS hardware for error-tolerant applications.This work proposes an approximate adder that to optimize area delay and achieve...Abstract:Approximate computing has received significant attention in the design of portable CMOS hardware for error-tolerant applications.This work proposes an approximate adder that to optimize area delay and achieve energy efficiency using Parallel Carry(PC)generation logic.For‘n’bits in input,the proposed algorithm use approximate addition for least n/2 significant bits and exact addition for most n/2 significant bits.A simple OR logic with no carry propagation is used to implement the approximate part.In the exact part,addition is performed using 4-bit adder blocks that implement PC at block level to reduce node capacitance in the critical path.Evaluations reveal that the maximum error of the proposed adder confines not more than 2n/2.As an enhancement of the proposed algorithm,we use the Error Recovery(ER)module to reduce the average error.Synthesis results of Proposed-PC(P-PC)and Proposed-PCER(P-PCER)adders with n-16 in 180nm Application Specific Integrated Circuit(ASIC)PDK technology revealed 44.2%&41.7%PDP reductions and 43.4%&40.7%ADP reductions,respectively compared to the latest best approximate design compared.The functional and driving effectiveness of proposed adders are examined through digital image processing applications.展开更多
In the recent years,error recovery circuits in optimized data path units are adopted with approximate computing methodology.In this paper the novel multipliers have effective utilization in the newly proposed two diff...In the recent years,error recovery circuits in optimized data path units are adopted with approximate computing methodology.In this paper the novel multipliers have effective utilization in the newly proposed two different 4:2 approximate compressors that generate Error free Sum(ES)and Error free Carry(EC).Proposed ES and Proposed EC in 4:2 compressors are used for performing Partial Product(PP)compression.The structural arrangement utilizes Dadda structure based PP.Due to the regularity of PP arrangement Dadda multiplier is chosen for compressor implementation that favors easy standard cell ASIC design.In this,the proposed compression idealogy are more effective in the smallest n columns,and the accurate compressor in the remaining most significant columns.This limits the error in the multiplier output to be not more than 2n for an n X n multiplication.The choice among the proposed compressors is decided based on the significance of the sum and carry signals on the multiplier result.As an enhancement to the proposed multiplier,we introduce two Area Efficient(AE)variants viz.,Proposed-AE(P-AE),and P-AE with Error Recovery(P-AEER).The proposed basic P-AE,and P-AEER designs exhibit 46.7%,52.9%,and 52.7%PDP reduction respectively when compared to an approximate multiplier of minimal error type and are designed with 90nm ASIC technology.The proposed design and their performance validation are done by using Cadence Encounter.The performance evaluations are carried out using cadence encounter with 90nm ASIC technology.The proposed-basic P-AEA and P-AEER designs demonstrate 46.7%,52.9%and 52.7%PDP reduction compared to the minimal error approximate multiplier.The proposed multiplier is implemented in digital image processing which revealed 0.9810 Structural SIMilarity Index(SSIM),to the least,and less than 3%deviation in ECG signal processing application.展开更多
The calculation of square roots is a frequently used operation in control systems of power electronics for different applications:motor drives,power converters,etc.At the same time,the execution of this procedure sign...The calculation of square roots is a frequently used operation in control systems of power electronics for different applications:motor drives,power converters,etc.At the same time,the execution of this procedure significantly loads microcontrollers and uses its power,which can be utilized for performing other important tasks.Therefore,it restricts the size of code,which can be processed by the microcontroller and compels developers to limit the number of functions,or to decrease execution frequency of a program.Thus,the calculation of square roots is a bottle-neck in implementation of high-performance control systems,thus effective optimization of this task is extremely important in modern and efficient devices.In respect that many applications do not need precise calculation of square roots,the optimization of execution time can be achieved by decreasing of precision of the result.The proposed technique is based on the approximation of parabola with hyperbola,which allows you to rapidly find the approximate value of a square root.Taking into account that many digital signal processors(DSP)are not equipped with an effective divider,the developed algorithm does not use divisions,so it can be executed faster.The payback for this optimization is approximation error with a maximum of 0.5%,however,it is acceptable for the overwhelming majority of control systems.展开更多
Image bitmaps,i.e.,data containing pixels and visual perception,have been widely used in emerging applica-tions for pixel operations while consuming lots of memory space and energy.Compared with legacy DRAM(dynamic ra...Image bitmaps,i.e.,data containing pixels and visual perception,have been widely used in emerging applica-tions for pixel operations while consuming lots of memory space and energy.Compared with legacy DRAM(dynamic ran-dom access memory),non-volatile memories(NVMs)are suitable for bitmap storage due to the salient features of high density and intrinsic durability.However,writing NVMs suffers from higher energy consumption and latency compared with read accesses.Existing precise or approximate compression schemes in NVM controllers show limited performance for bitmaps due to the irregular data patterns and variance in bitmaps.We observe the pixel-level similarity when writing bitmaps due to the analogous contents in adjacent pixels.By exploiting the pixel-level similarity,we propose SimCom,an approximate similarity-aware compression scheme in the NVM module controller,to efficiently compress data for each write access on-the-fly.The idea behind SimCom is to compress continuous similar words into the pairs of base words with runs.The storage costs for small runs are further mitigated by reusing the least significant bits of base words.SimCom adaptively selects an appropriate compression mode for various bitmap formats,thus achieving an efficient trade-off be-tween quality and memory performance.We implement SimCom on GEM5/zsim with NVMain and evaluate the perfor-mance with real-world image/video workloads.Our results demonstrate the efficacy and efficiency of our SimCom with an efficient quality-performance trade-off.展开更多
Many properties of natural fractures are uncertain,such as their spatial distribution,petrophysical properties,and fluid flow performance.Bayesian theorem provides a framework to quantify the uncertainty in geological...Many properties of natural fractures are uncertain,such as their spatial distribution,petrophysical properties,and fluid flow performance.Bayesian theorem provides a framework to quantify the uncertainty in geological modeling and flow simulation,and hence to support reservoir performance predictions.The application of Bayesian methods to fractured reservoirs has mostly been limited to synthetic cases.In field applications,however,one of the main problems is that the Bayesian prior is falsified,because it fails to predict past reservoir production data.In this paper,we show how a global sensitivity analysis(GSA)can be used to identify why the prior is falsified.We then employ an approximate Bayesian computation(ABC)method combined with a tree-based surrogate model to match the production history.We apply these two approaches to a complex fractured oil and gas reservoir where all uncertainties are jointly considered,including the petrophysical properties,rock physics properties,fluid properties,discrete fracture parameters,and dynamics of pressure and transmissibility.We successfully identify several reasons for the falsification.The results show that the methods we propose are effective in quantifying uncertainty in the modeling and flow simulation of a fractured reservoir.The uncertainties of key parameters,such as fracture aperture and fault conductivity,are reduced.展开更多
In this paper,the approximate Bayesian computation combines the particle swarm optimization and se-quential Monte Carlo methods,which identify the parameters of the Mathieu-van der Pol-Duffing chaotic energy harvester...In this paper,the approximate Bayesian computation combines the particle swarm optimization and se-quential Monte Carlo methods,which identify the parameters of the Mathieu-van der Pol-Duffing chaotic energy harvester system.Then the proposed method is applied to estimate the coefficients of the chaotic model and the response output paths of the identified coefficients compared with the observed,which verifies the effectiveness of the proposed method.Finally,a partial response sample of the regular and chaotic responses,determined by the maximum Lyapunov exponent,is applied to detect whether chaotic motion occurs in them by a 0-1 test.This paper can provide a reference for data-based parameter iden-tification and chaotic prediction of chaotic vibration energy harvester systems.展开更多
Understanding speciation has long been a fundamental goal of evolutionary biology.It is widely accepted that speciation requires an interruption of gene flow to generate strong reproductive isolation between species.T...Understanding speciation has long been a fundamental goal of evolutionary biology.It is widely accepted that speciation requires an interruption of gene flow to generate strong reproductive isolation between species.The mechanism of how speciation in sexually dichromatic species operates in the face of gene flow remains an open question.Two species in the genus Chrysolophus,the Golden Pheasant(C.pictus)and Lady Amherst’s Pheasant(C.amherstiae),both of which exhibit significant plumage dichromatism,are currently parapatric in southwestern China with several hybrid recordings in field.In this study,we estimated the pattern of gene flow during the speciation of the two pheasants using the Approximate Bayesian Computation(ABC)method based on data from multiple genes.Using a newly assembled de novo genome of Lady Amherst’s Pheasant and resequencing of widely distributed individuals,we reconstructed the demographic history of the two pheasants by the PSMC(pairwise sequentially Markovian coalescent)method.The results provide clear evidence that the gene flow between the two pheasants was consistent with the predictions of the isolation with migration model during divergence,indicating that there was long-term gene flow after the initial divergence(ca.2.2 million years ago).The data further support the occurrence of secondary contact between the parapatric populations since around 30 kya with recurrent gene flow to the present,a pattern that may have been induced by the population expansion of the Golden Pheasant in the late Pleistocene.The results of the study support the scenario of speciation between the Golden Pheasant and Lady Amherst’s Pheasant with cycles of mixing-isolation-mixing,possibly due to the dynamics of geographical context in the late Pleistocene.The two species provide a good research system as an evolutionary model for testing reinforcement selection in speciation.展开更多
Distributed computing frameworks are the fundamental component of distributed computing systems.They provide an essential way to support the efficient processing of big data on clusters or cloud.The size of big data i...Distributed computing frameworks are the fundamental component of distributed computing systems.They provide an essential way to support the efficient processing of big data on clusters or cloud.The size of big data increases at a pace that is faster than the increase in the big data processing capacity of clusters.Thus,distributed computing frameworks based on the MapReduce computing model are not adequate to support big data analysis tasks which often require running complex analytical algorithms on extremely big data sets in terabytes.In performing such tasks,these frameworks face three challenges:computational inefficiency due to high I/O and communication costs,non-scalability to big data due to memory limit,and limited analytical algorithms because many serial algorithms cannot be implemented in the MapReduce programming model.New distributed computing frameworks need to be developed to conquer these challenges.In this paper,we review MapReduce-type distributed computing frameworks that are currently used in handling big data and discuss their problems when conducting big data analysis.In addition,we present a non-MapReduce distributed computing framework that has the potential to overcome big data analysis challenges.展开更多
The development of IoT(Internet of Things)calls for circuit designs with energy and area efficiency for edge devices.Approximate computing which trades unnecessary computation precision for hardware cost savings is a ...The development of IoT(Internet of Things)calls for circuit designs with energy and area efficiency for edge devices.Approximate computing which trades unnecessary computation precision for hardware cost savings is a promising direction for error-tolerant applications.Multipliers,as frequently invoked basic modules which consume non-trivial hardware costs,have been introduced approximation to achieve distinct energy and area savings for data-intensive applications.In this paper,we propose a fixed-point approximate multiplier that employs a linear mapping technique,which enables the configurability of approximation levels and the unbiasedness of computation errors.We then introduce a dynamic truncation method into the proposed multiplier design to cover a wider and more fine-grained configuration range of approximation for more flexible hardware cost savings.In addition,a novel normalization module is proposed for the required shifting operations,which balances the occupied area and the critical path delay compared with normal shifters.The introduced errors of our proposed design are analyzed and expressed by formulas which are validated by experimental results.Experimental evaluations show that compared with accurate multipliers,our proposed approximate multiplier design provides maximum area and power savings up to 49.70%and 66.39%respectively with acceptable computation errors.展开更多
As a primary computation unit,a processing element(PE)is key to the energy efficiency of a convolutional neural network(CNN)accelerator.Taking advantage of the inherent error tolerance of CNNs,approximate computing wi...As a primary computation unit,a processing element(PE)is key to the energy efficiency of a convolutional neural network(CNN)accelerator.Taking advantage of the inherent error tolerance of CNNs,approximate computing with high hardware efficiency has been considered for implementing the computation units of CNN accelerators.However,individual approximate designs such as multipliers and adders can only achieve limited accuracy and hardware improvements.In this paper,an approximate PE is dedicatedly devised for CNN accelerators by synergistically considering the data representation,multiplication and accumulation.An approximate data format is defined for the weights using stochastic rounding.This data format enables a simple implementation of multiplication by using small lookup tables,an adder and a shifter.Two approximate accumulators are further proposed for the product accumulation in the PE.Compared with the exact 8-bit fixed-point design,the proposed PE saves more than 29%and 20%in power-delay product for 3×3 and 5×5 sum of products,respectively.Also,compared with the PEs consisting of state-of-the-art approximate multipliers,the proposed design shows significantly smaller error bias with lower hardware overhead.Moreover,the application of the approximate PEs in CNN accelerators is analyzed by implementing a multi-task CNN for face detection and alignment.We conclude that 1)an approximate PE is more effective for face detection than for alignment,2)an approximate PE with high statistically-measured accuracy does not necessarily result in good quality in face detection,and 3)properly increasing the number of PEs in a CNN accelerator can improve its power and energy efficiency.展开更多
Carbon nanotube field-effect transistors(CNTFETs) are reliable alternatives for conventional transistors, especially for use in approximate computing(AC) based error-resilient digital circuits. In this paper, CNTFET t...Carbon nanotube field-effect transistors(CNTFETs) are reliable alternatives for conventional transistors, especially for use in approximate computing(AC) based error-resilient digital circuits. In this paper, CNTFET technology and the gate diffusion input(GDI) technique are merged, and three new AC-based full adders(FAs) are presented with 6, 6, and 8 transistors, separately. The nondominated sorting based genetic algorithm II(NSGA-II) is used to attain the optimal performance of the proposed cells by considering the number of tubes and chirality vectors as its variables. The results confirm the circuits' improvement by about 50% in terms of power-delay-product(PDP) at the cost of area occupation. The Monte Carlo method(MCM) and 32-nm CNTFET technology are used to evaluate the lithographic variations and the stability of the proposed circuits during the fabrication process, in which the higher stability of the proposed circuits compared to those in the literature is observed. The dynamic threshold(DT) technique in the transistors of the proposed circuits amends the possible voltage drop at the outputs. Circuitry performance and error metrics of the proposed circuits nominate them for the least significant bit(LSB) parts of more complex arithmetic circuits such as multipliers.展开更多
Numerical approximate computations can solve large and complex problems fast.They have the advantage of high efficiency.However they only give approximate results,whereas we need exact results in some fields.There is ...Numerical approximate computations can solve large and complex problems fast.They have the advantage of high efficiency.However they only give approximate results,whereas we need exact results in some fields.There is a gap between approximate computations and exact results. In this paper,we build a bridge by which exact results can be obtained by numerical approximate computations.展开更多
Due to the flexibility and feasibility of addressing ill-posed problems,the Bayesian method has been widely used in inverse heat conduction problems(IHCPs).However,in the real science and engineering IHCPs,the likelih...Due to the flexibility and feasibility of addressing ill-posed problems,the Bayesian method has been widely used in inverse heat conduction problems(IHCPs).However,in the real science and engineering IHCPs,the likelihood function of the Bayesian method is commonly computationally expensive or analytically unavailable.In this study,in order to circumvent this intractable likelihood function,the approximate Bayesian computation(ABC)is expanded to the IHCPs.In ABC,the high dimensional observations in the intractable likelihood function are equalized by their low dimensional summary statistics.Thus,the performance of the ABC depends on the selection of summary statistics.In this study,a machine learning-based ABC(ML-ABC)is proposed to address the complicated selections of the summary statistics.The Auto-Encoder(AE)is a powerful Machine Learning(ML)framework which can compress the observations into very low dimensional summary statistics with little information loss.In addition,in order to accelerate the calculation of the proposed framework,another neural network(NN)is utilized to construct the mapping between the unknowns and the summary statistics.With this mapping,given arbitrary unknowns,the summary statistics can be obtained efficiently without solving the time-consuming forward problem with numerical method.Furthermore,an adaptive nested sampling method(ANSM)is developed to further improve the efficiency of sampling.The performance of the proposed method is demonstrated with two IHCP cases.展开更多
文摘Approximate Computing is a low power achieving technique that offers an additional degree of freedom to design digital circuits.Pruning is one of the types of approximate circuit design technique which removes logic gates or wires in the circuit to reduce power consumption with minimal insertion of error.In this work,a novel machine learning(ML)-based pruning technique is introduced to design digital circuits.The machine-learning algorithm of the random forest deci-sion tree is used to prune nodes selectively based on their input pattern.In addi-tion,an error compensation value is added to the original output to reduce an error rate.Experimental results proved the efficiency of the proposed technique in terms of area,power and error rate.Compared to conventional pruning,proposed ML pruning achieves 32%and 26%of the area and delay reductions in 8*8 multi-plier implementation.Low power image processing algorithms are essential in various applications like image compression and enhancement algorithms.For real-time evaluation,proposed ML optimized pruning is applied in discrete cosine transform(DCT).It is a basic element of image and video processing applications.Experimental results on benchmark images show that proposed pruning achieves a very good peak signal-to-noise ratio(PSNR)value with a considerable amount of energy savings compared to other methods.
基金supported by the National Natural Science Foundation of China(Nos.61432017 and 61772327)the Natural Science Foundation of Shanghai(Nos.20ZR1455900 and 20ZR1421600)+1 种基金the Qi'anxin National Engineering Laboratory for Big Data Collaborative Security Technology Open Project(No.QAX-201803)State Key Laboratory of Computer Architecture,Institute of Computing Technology,Chinese Academy of Sciences(No.CARCHA202005)。
文摘In recent years,Approximate Computing Circuits(ACCs)have been widely used in applications with intrinsic tolerance to errors.With the increased availability of approximate computing circuit approaches,reliability analysis methods for assessing their fault vulnerability have become highly necessary.In this study,two accurate reliability evaluation methods for approximate computing circuits are proposed.The reliability of approximate computing circuits is calculated on the basis of the iterative Probabilistic Transfer Matrix(PTM)model.During the calculation,the correlation coefficients are derived and combined to deal with the correlation problem caused by fanout reconvergence.The accuracy and scalability of the two methods are verified using three sets of approximate computing circuit instances and more circuits in Evo Approx8 b,which is an approximate computing circuit open source library.Experimental results show that relative to the Monte Carlo simulation,the two methods achieve average error rates of 0.46%and 1.29%and time overheads of 0.002%and 0.1%.Different from the existing approaches to reliability estimation for approximate computing circuits based on the original PTM model,the proposed methods reduce the space overheads by nearly 50%and achieve time overheads of 1.78%and 2.19%.
基金supported by the Fundamental Research Funds for the Central Universities of China under Grant No.BLX202015Beijing Municipal Natural Science Foundation under Grant No.6222038the National Natural Science Foundation of China under Grant No.92164203.
文摘Realizing a high-performance and energy-efficient circuit system is one of the critical tasks for circuit designers.Conventional researchers always concentrated on the tradeoffs between the energy and the performance in circuit and system design based on accurate computing.However,as video/image processing and machine learning algorithms are widespread,the technique of approximate computing in these applications has become a hot topic.The errors caused by approximate computing could be tolerated by these applications with specific processing or algorithms,and large improvements in performance or power savings could be achieved with some acceptable loss in final output quality.This paper presents a survey of approximate computing from arithmetic units design to high-level applications,in which we try to give researchers a comprehensive and insightful understanding of approximate computing.We believe that approximate computing will play an important role in the circuit and system design in the future,especially with the rapid development of artificial intelligence algorithms and their related applications.
基金supported in part by National Natural Science Foundation of China under Grant No.62172349,62032020,and 62172350the Research Foundation of Education Bureau of Hunan Province under Grant No.21B0139+1 种基金the National Key Research and Development Program of China under Grant 2021YFB3101200Hunan Science and Technology Planning Project under Grant No.2019RS3019.
文摘In edge computing,a reasonable edge resource bidding mechanism can enable edge providers and users to obtain benefits in a relatively fair fashion.To maximize such benefits,this paper proposes a dynamic multiattribute resource bidding mechanism(DMRBM).Most of the previous work mainly relies on a third-party agent to exchange information to gain optimal benefits.It isworth noting thatwhen edge providers and users trade with thirdparty agents which are not entirely reliable and trustworthy,their sensitive information is prone to be leaked.Moreover,the privacy protection of edge providers and users must be considered in the dynamic pricing/transaction process,which is also very challenging.Therefore,this paper first adopts a privacy protection algorithm to prevent sensitive information from leakage.On the premise that the sensitive data of both edge providers and users are protected,the prices of providers fluctuate within a certain range.Then,users can choose appropriate edge providers by the price-performance ratio(PPR)standard and the reward of lower price(LPR)standard according to their demands.The two standards can be evolved by two evaluation functions.Furthermore,this paper employs an approximate computing method to get an approximate solution of DMRBM in polynomial time.Specifically,this paper models the bidding process as a non-cooperative game and obtains the approximate optimal solution based on two standards according to the game theory.Through the extensive experiments,this paper demonstrates that the DMRBM satisfies the individual rationality,budget balance,and privacy protection and it can also increase the task offloading rate and the system benefits.
文摘This paper proposes a hardware-efficient implementation of division, which is useful for image processing in WSN edge devices. For error-resilient applications such as image processing, accurate calculations can be unnecessary overhead, and approximate computing that obtains circuit benefits from inaccurate calculations is effective. Since there are studies showing sufficient performance with few bit operations, this paper proposes a combinational arithmetic circuit design of 16 bits or less. The proposed design is an approximate restoring division circuit implemented with a 2-dimensional array of 1-bit subtractor cells. The main drawback of such a design is the long “borrow-chain” that traverses all of the rows of the 2-dimensional subtractor array before a final stable quotient result can be produced, thereby resulting in a long delay and excessive power dissipation. This paper proposes two approximate subtractor cell designs, named ABSC and ADSC, that break this borrow chain: the first in the vertical direction and the second in the horizontal direction, respectively. The proposed approximate divider designs are compared with an accurate design and previous state-of-the-art designs based on accuracy and hardware overhead. The proposed designs have accuracy levels that are close to the best accuracy levels achieved by previous state-of-the-art approximate divider designs. In addition, the proposed ADSC design had the lowest delay, area, and power characteristics. Finally, the implementation of both proposed designs for two practical applications showed that both designs provide sufficient division accuracy.
文摘The on line computational burden related to model predictive control (MPC) of large scale constrained systems hampers its real time applications and limits it to slow dynamic process with moderate number of inputs. To avoid this, an efficient and fast algorithm based on aggregation optimization is proposed in this paper. It only optimizes the current control action at time instant k , while other future control sequences in the optimization horizon are approximated off line by the linear feedback control sequence, so the on line optimization can be converted into a low dimensional quadratic programming problem. Input constraints can be well handled in this scheme. The comparable performance is achieved with existing standard model predictive control algorithm. Simulation results well demonstrate its effectiveness.
基金Project supported by the National Natural Science Foundation of China(Nos.62076168 and 61772350)Beijing Nova Program(No.Z181100006218093)the Research Fund from Beijing Innovation Center for Future Chips(No.KYJJ2018008)。
文摘The demise of Dennard’s scaling has created both power and utilization wall challenges for computer systems.As transistors operating in the near-threshold region are able to obtain flexible trade-offs between power and performance,it is regarded as an alternative solution to the scaling challenge.A reduction in supply voltage will nevertheless generate significant reliability challenges,while maintaining an error-free system that generates high costs in both performance and energy consumption.The main purpose of research on computer architecture has therefore shifted from performance improvement to complex multi-objective optimization.In this paper,we propose a three-dimensional optimization approach which can effectively identify the best system configuration to establish a balance among performance,energy,and reliability.We use a dynamic programming algorithm to determine the proper voltage and approximate level based on three predictors:system performance,energy consumption,and output quality.We propose an output quality predictor which uses a hardware/software co-design fault injection platform to evaluate the impact of the error on output quality under near-threshold computing(NTC).Evaluation results demonstrate that our approach can lead to a 28% improvement in output quality with a 10% drop in overall energy efficiency;this translates to an approximately 20% average improvement in accuracy,power,and performance.
文摘Abstract:Approximate computing has received significant attention in the design of portable CMOS hardware for error-tolerant applications.This work proposes an approximate adder that to optimize area delay and achieve energy efficiency using Parallel Carry(PC)generation logic.For‘n’bits in input,the proposed algorithm use approximate addition for least n/2 significant bits and exact addition for most n/2 significant bits.A simple OR logic with no carry propagation is used to implement the approximate part.In the exact part,addition is performed using 4-bit adder blocks that implement PC at block level to reduce node capacitance in the critical path.Evaluations reveal that the maximum error of the proposed adder confines not more than 2n/2.As an enhancement of the proposed algorithm,we use the Error Recovery(ER)module to reduce the average error.Synthesis results of Proposed-PC(P-PC)and Proposed-PCER(P-PCER)adders with n-16 in 180nm Application Specific Integrated Circuit(ASIC)PDK technology revealed 44.2%&41.7%PDP reductions and 43.4%&40.7%ADP reductions,respectively compared to the latest best approximate design compared.The functional and driving effectiveness of proposed adders are examined through digital image processing applications.
文摘In the recent years,error recovery circuits in optimized data path units are adopted with approximate computing methodology.In this paper the novel multipliers have effective utilization in the newly proposed two different 4:2 approximate compressors that generate Error free Sum(ES)and Error free Carry(EC).Proposed ES and Proposed EC in 4:2 compressors are used for performing Partial Product(PP)compression.The structural arrangement utilizes Dadda structure based PP.Due to the regularity of PP arrangement Dadda multiplier is chosen for compressor implementation that favors easy standard cell ASIC design.In this,the proposed compression idealogy are more effective in the smallest n columns,and the accurate compressor in the remaining most significant columns.This limits the error in the multiplier output to be not more than 2n for an n X n multiplication.The choice among the proposed compressors is decided based on the significance of the sum and carry signals on the multiplier result.As an enhancement to the proposed multiplier,we introduce two Area Efficient(AE)variants viz.,Proposed-AE(P-AE),and P-AE with Error Recovery(P-AEER).The proposed basic P-AE,and P-AEER designs exhibit 46.7%,52.9%,and 52.7%PDP reduction respectively when compared to an approximate multiplier of minimal error type and are designed with 90nm ASIC technology.The proposed design and their performance validation are done by using Cadence Encounter.The performance evaluations are carried out using cadence encounter with 90nm ASIC technology.The proposed-basic P-AEA and P-AEER designs demonstrate 46.7%,52.9%and 52.7%PDP reduction compared to the minimal error approximate multiplier.The proposed multiplier is implemented in digital image processing which revealed 0.9810 Structural SIMilarity Index(SSIM),to the least,and less than 3%deviation in ECG signal processing application.
文摘The calculation of square roots is a frequently used operation in control systems of power electronics for different applications:motor drives,power converters,etc.At the same time,the execution of this procedure significantly loads microcontrollers and uses its power,which can be utilized for performing other important tasks.Therefore,it restricts the size of code,which can be processed by the microcontroller and compels developers to limit the number of functions,or to decrease execution frequency of a program.Thus,the calculation of square roots is a bottle-neck in implementation of high-performance control systems,thus effective optimization of this task is extremely important in modern and efficient devices.In respect that many applications do not need precise calculation of square roots,the optimization of execution time can be achieved by decreasing of precision of the result.The proposed technique is based on the approximation of parabola with hyperbola,which allows you to rapidly find the approximate value of a square root.Taking into account that many digital signal processors(DSP)are not equipped with an effective divider,the developed algorithm does not use divisions,so it can be executed faster.The payback for this optimization is approximation error with a maximum of 0.5%,however,it is acceptable for the overwhelming majority of control systems.
基金This work was supported in part by the National Natural Science Foundation of China under Grant Nos.62125202 and U22B2022.
文摘Image bitmaps,i.e.,data containing pixels and visual perception,have been widely used in emerging applica-tions for pixel operations while consuming lots of memory space and energy.Compared with legacy DRAM(dynamic ran-dom access memory),non-volatile memories(NVMs)are suitable for bitmap storage due to the salient features of high density and intrinsic durability.However,writing NVMs suffers from higher energy consumption and latency compared with read accesses.Existing precise or approximate compression schemes in NVM controllers show limited performance for bitmaps due to the irregular data patterns and variance in bitmaps.We observe the pixel-level similarity when writing bitmaps due to the analogous contents in adjacent pixels.By exploiting the pixel-level similarity,we propose SimCom,an approximate similarity-aware compression scheme in the NVM module controller,to efficiently compress data for each write access on-the-fly.The idea behind SimCom is to compress continuous similar words into the pairs of base words with runs.The storage costs for small runs are further mitigated by reusing the least significant bits of base words.SimCom adaptively selects an appropriate compression mode for various bitmap formats,thus achieving an efficient trade-off be-tween quality and memory performance.We implement SimCom on GEM5/zsim with NVMain and evaluate the perfor-mance with real-world image/video workloads.Our results demonstrate the efficacy and efficiency of our SimCom with an efficient quality-performance trade-off.
文摘Many properties of natural fractures are uncertain,such as their spatial distribution,petrophysical properties,and fluid flow performance.Bayesian theorem provides a framework to quantify the uncertainty in geological modeling and flow simulation,and hence to support reservoir performance predictions.The application of Bayesian methods to fractured reservoirs has mostly been limited to synthetic cases.In field applications,however,one of the main problems is that the Bayesian prior is falsified,because it fails to predict past reservoir production data.In this paper,we show how a global sensitivity analysis(GSA)can be used to identify why the prior is falsified.We then employ an approximate Bayesian computation(ABC)method combined with a tree-based surrogate model to match the production history.We apply these two approaches to a complex fractured oil and gas reservoir where all uncertainties are jointly considered,including the petrophysical properties,rock physics properties,fluid properties,discrete fracture parameters,and dynamics of pressure and transmissibility.We successfully identify several reasons for the falsification.The results show that the methods we propose are effective in quantifying uncertainty in the modeling and flow simulation of a fractured reservoir.The uncertainties of key parameters,such as fracture aperture and fault conductivity,are reduced.
基金This work is supported by the National Nature Science Founda-tion of China(Nos.11972019 and 12102237).
文摘In this paper,the approximate Bayesian computation combines the particle swarm optimization and se-quential Monte Carlo methods,which identify the parameters of the Mathieu-van der Pol-Duffing chaotic energy harvester system.Then the proposed method is applied to estimate the coefficients of the chaotic model and the response output paths of the identified coefficients compared with the observed,which verifies the effectiveness of the proposed method.Finally,a partial response sample of the regular and chaotic responses,determined by the maximum Lyapunov exponent,is applied to detect whether chaotic motion occurs in them by a 0-1 test.This paper can provide a reference for data-based parameter iden-tification and chaotic prediction of chaotic vibration energy harvester systems.
基金supported by the National Natural Science Foundation of China(No.31471987)approved by College of Life Sciences,Beijing Normal University:No.CLSEAW-2013-007。
文摘Understanding speciation has long been a fundamental goal of evolutionary biology.It is widely accepted that speciation requires an interruption of gene flow to generate strong reproductive isolation between species.The mechanism of how speciation in sexually dichromatic species operates in the face of gene flow remains an open question.Two species in the genus Chrysolophus,the Golden Pheasant(C.pictus)and Lady Amherst’s Pheasant(C.amherstiae),both of which exhibit significant plumage dichromatism,are currently parapatric in southwestern China with several hybrid recordings in field.In this study,we estimated the pattern of gene flow during the speciation of the two pheasants using the Approximate Bayesian Computation(ABC)method based on data from multiple genes.Using a newly assembled de novo genome of Lady Amherst’s Pheasant and resequencing of widely distributed individuals,we reconstructed the demographic history of the two pheasants by the PSMC(pairwise sequentially Markovian coalescent)method.The results provide clear evidence that the gene flow between the two pheasants was consistent with the predictions of the isolation with migration model during divergence,indicating that there was long-term gene flow after the initial divergence(ca.2.2 million years ago).The data further support the occurrence of secondary contact between the parapatric populations since around 30 kya with recurrent gene flow to the present,a pattern that may have been induced by the population expansion of the Golden Pheasant in the late Pleistocene.The results of the study support the scenario of speciation between the Golden Pheasant and Lady Amherst’s Pheasant with cycles of mixing-isolation-mixing,possibly due to the dynamics of geographical context in the late Pleistocene.The two species provide a good research system as an evolutionary model for testing reinforcement selection in speciation.
基金supported by the National Natural Science Foundation of China(No.61972261)Basic Research Foundations of Shenzhen(Nos.JCYJ 20210324093609026 and JCYJ20200813091134001).
文摘Distributed computing frameworks are the fundamental component of distributed computing systems.They provide an essential way to support the efficient processing of big data on clusters or cloud.The size of big data increases at a pace that is faster than the increase in the big data processing capacity of clusters.Thus,distributed computing frameworks based on the MapReduce computing model are not adequate to support big data analysis tasks which often require running complex analytical algorithms on extremely big data sets in terabytes.In performing such tasks,these frameworks face three challenges:computational inefficiency due to high I/O and communication costs,non-scalability to big data due to memory limit,and limited analytical algorithms because many serial algorithms cannot be implemented in the MapReduce programming model.New distributed computing frameworks need to be developed to conquer these challenges.In this paper,we review MapReduce-type distributed computing frameworks that are currently used in handling big data and discuss their problems when conducting big data analysis.In addition,we present a non-MapReduce distributed computing framework that has the potential to overcome big data analysis challenges.
基金supported by the National Key Research and Development Program of China under Grant No.2018YFE0126300the National Natural Science Foundation of China under Grant Nos.62034007,62141404.
文摘The development of IoT(Internet of Things)calls for circuit designs with energy and area efficiency for edge devices.Approximate computing which trades unnecessary computation precision for hardware cost savings is a promising direction for error-tolerant applications.Multipliers,as frequently invoked basic modules which consume non-trivial hardware costs,have been introduced approximation to achieve distinct energy and area savings for data-intensive applications.In this paper,we propose a fixed-point approximate multiplier that employs a linear mapping technique,which enables the configurability of approximation levels and the unbiasedness of computation errors.We then introduce a dynamic truncation method into the proposed multiplier design to cover a wider and more fine-grained configuration range of approximation for more flexible hardware cost savings.In addition,a novel normalization module is proposed for the required shifting operations,which balances the occupied area and the critical path delay compared with normal shifters.The introduced errors of our proposed design are analyzed and expressed by formulas which are validated by experimental results.Experimental evaluations show that compared with accurate multipliers,our proposed approximate multiplier design provides maximum area and power savings up to 49.70%and 66.39%respectively with acceptable computation errors.
基金supported in part by the National Natural Science Foundation of China under Grant No.62104127the National Key Research and Development Program of China under Grant No.2022YFB4500200.
文摘As a primary computation unit,a processing element(PE)is key to the energy efficiency of a convolutional neural network(CNN)accelerator.Taking advantage of the inherent error tolerance of CNNs,approximate computing with high hardware efficiency has been considered for implementing the computation units of CNN accelerators.However,individual approximate designs such as multipliers and adders can only achieve limited accuracy and hardware improvements.In this paper,an approximate PE is dedicatedly devised for CNN accelerators by synergistically considering the data representation,multiplication and accumulation.An approximate data format is defined for the weights using stochastic rounding.This data format enables a simple implementation of multiplication by using small lookup tables,an adder and a shifter.Two approximate accumulators are further proposed for the product accumulation in the PE.Compared with the exact 8-bit fixed-point design,the proposed PE saves more than 29%and 20%in power-delay product for 3×3 and 5×5 sum of products,respectively.Also,compared with the PEs consisting of state-of-the-art approximate multipliers,the proposed design shows significantly smaller error bias with lower hardware overhead.Moreover,the application of the approximate PEs in CNN accelerators is analyzed by implementing a multi-task CNN for face detection and alignment.We conclude that 1)an approximate PE is more effective for face detection than for alignment,2)an approximate PE with high statistically-measured accuracy does not necessarily result in good quality in face detection,and 3)properly increasing the number of PEs in a CNN accelerator can improve its power and energy efficiency.
文摘Carbon nanotube field-effect transistors(CNTFETs) are reliable alternatives for conventional transistors, especially for use in approximate computing(AC) based error-resilient digital circuits. In this paper, CNTFET technology and the gate diffusion input(GDI) technique are merged, and three new AC-based full adders(FAs) are presented with 6, 6, and 8 transistors, separately. The nondominated sorting based genetic algorithm II(NSGA-II) is used to attain the optimal performance of the proposed cells by considering the number of tubes and chirality vectors as its variables. The results confirm the circuits' improvement by about 50% in terms of power-delay-product(PDP) at the cost of area occupation. The Monte Carlo method(MCM) and 32-nm CNTFET technology are used to evaluate the lithographic variations and the stability of the proposed circuits during the fabrication process, in which the higher stability of the proposed circuits compared to those in the literature is observed. The dynamic threshold(DT) technique in the transistors of the proposed circuits amends the possible voltage drop at the outputs. Circuitry performance and error metrics of the proposed circuits nominate them for the least significant bit(LSB) parts of more complex arithmetic circuits such as multipliers.
基金This work was partially supported by China 973 Project (Grant No.NKBRPC-2004CB318003)the Knowledge Innovation Program of the Chinese Academy of Sciences (Grant No.KJCX2-YW-S02).
文摘Numerical approximate computations can solve large and complex problems fast.They have the advantage of high efficiency.However they only give approximate results,whereas we need exact results in some fields.There is a gap between approximate computations and exact results. In this paper,we build a bridge by which exact results can be obtained by numerical approximate computations.
文摘Due to the flexibility and feasibility of addressing ill-posed problems,the Bayesian method has been widely used in inverse heat conduction problems(IHCPs).However,in the real science and engineering IHCPs,the likelihood function of the Bayesian method is commonly computationally expensive or analytically unavailable.In this study,in order to circumvent this intractable likelihood function,the approximate Bayesian computation(ABC)is expanded to the IHCPs.In ABC,the high dimensional observations in the intractable likelihood function are equalized by their low dimensional summary statistics.Thus,the performance of the ABC depends on the selection of summary statistics.In this study,a machine learning-based ABC(ML-ABC)is proposed to address the complicated selections of the summary statistics.The Auto-Encoder(AE)is a powerful Machine Learning(ML)framework which can compress the observations into very low dimensional summary statistics with little information loss.In addition,in order to accelerate the calculation of the proposed framework,another neural network(NN)is utilized to construct the mapping between the unknowns and the summary statistics.With this mapping,given arbitrary unknowns,the summary statistics can be obtained efficiently without solving the time-consuming forward problem with numerical method.Furthermore,an adaptive nested sampling method(ANSM)is developed to further improve the efficiency of sampling.The performance of the proposed method is demonstrated with two IHCP cases.