An intense laser pulse focused onto a plasma can excite nonlinear plasma waves.Under appropriate conditions,electrons from the background plasma are trapped in the plasma wave and accelerated to ultra-relativistic vel...An intense laser pulse focused onto a plasma can excite nonlinear plasma waves.Under appropriate conditions,electrons from the background plasma are trapped in the plasma wave and accelerated to ultra-relativistic velocities.This scheme is called a laser wakefield accelerator.In this work,we present results from a laser wakefield acceleration experiment using a petawatt-class laser to excite the wakefields as well as nanoparticles to assist the injection of electrons into the accelerating phase of the wakefields.We find that a 10-cm-long,nanoparticle-assisted laser wakefield accelerator can generate 340 pC,10±1.86 GeV electron bunches with a 3.4 GeV rms convolved energy spread and a 0.9 mrad rms divergence.It can also produce bunches with lower energies in the 4–6 GeV range.展开更多
We present a first on-chip positron accelerator based on dielectric laser acceleration.This innovative approach significantly reduces the physical dimensions of the positron acceleration apparatus,enhancing its feasib...We present a first on-chip positron accelerator based on dielectric laser acceleration.This innovative approach significantly reduces the physical dimensions of the positron acceleration apparatus,enhancing its feasibility for diverse applications.By utilizing a stacked acceleration structure and far-infrared laser technology,we are able to achieve a seven-stage acceleration structure that surpasses the distance and energy gain of using the previous dielectric laser acceleration methods.Additionally,we are able to compress the positron beam to an ultrafast sub-femtosecond scale during the acceleration process,compared with the traditional methods,the positron beam is compressed to a greater extent.We also demonstrate the robustness of the stacked acceleration structure through the successful acceleration of the positron beam.展开更多
The flexibility in radiotherapy can be improved if patients can be moved between any one of the department’s medical linear accelerators (LINACs) without the need to change anything in the patient’s treatment plan. ...The flexibility in radiotherapy can be improved if patients can be moved between any one of the department’s medical linear accelerators (LINACs) without the need to change anything in the patient’s treatment plan. For this to be possible, the dosimetric characteristics of the various accelerators must be the same, or nearly the same. The purpose of this work is to describe further and compare measurements and parameters after the initial vendor-recommended beam matching of the five LINACs. Deviations related to dose calculations and to beam matched accelerators may compromise treatment accuracy. The safest and most practical way to ensure that all accelerators are within clinical acceptable accuracy is to include TPS calculations in the LINACs matching evaluation. Treatment planning system (TPS) was used to create three photons plans with different field sizes 3 × 3 cm, 10 × 10 cm and 25 × 25 cm at a depth of 4.5 cm in Perspex. Calculated TPS plans were sent to Mosaiq to be delivered by five LINACs. TPS plans were compared with five LINACs measurements data using Gamma analyses of 2% and 2 mm. The results suggest that for four out of the five LINACs, there was generally good agreement, less than a 2% deviation between the planned dose distribution and the measured dose distribution. However, one specific LINAC named “Asterix” exhibited a deviation of 2.121% from the planned dose. The results show that all of the LINACs’ performance were within the acceptable deviation and delivering radiation dose consistently and accurately.展开更多
In recent years,heavy ion accelerator technology has been rapidly developing worldwide and widely applied in the fields of space radiation simulation and particle therapy.Usually,a very high uniformity in the irradiat...In recent years,heavy ion accelerator technology has been rapidly developing worldwide and widely applied in the fields of space radiation simulation and particle therapy.Usually,a very high uniformity in the irradiation area is required for the extracted ion beams,which is crucial because it directly affects the experimental precision and therapeutic effect.Specifically,ultra-large-area and high-uniformity scanning are crucial requirements for spacecraft radiation effects assessment and serve as core specification for beamline terminal design.In the 300 MeV proton and heavy ion accelerator complex at the Space Environment Simulation and Research Infrastructure(SESRI),proton and heavy ion beams will be accelerated and ultimately delivered to three irradiation terminals.In order to achieve the required large irradiation area of 320 mm×320 mm,horizontal and vertical scanning magnets are used in the extraction beam line.However,considering the various requirements for beam species and energies,the tracking accuracy of power supplies(PSs),the eddy current effect of scanning magnets,and the fluctuation of ion bunch structure will reduce the irradiation uniformity.To mitigate these effects,a beam uniformity optimization method based on the measured beam distribution was proposed and applied in the accelerator complex at SESRI.In the experiment,the uniformity is successfully optimized from 75%to over 90%after five iterations of adjustment to the PS waveforms.In this paper,the method and experimental results were introduced.展开更多
Massive computational complexity and memory requirement of artificial intelligence models impede their deploy-ability on edge computing devices of the Internet of Things(IoT).While Power-of-Two(PoT)quantization is pro...Massive computational complexity and memory requirement of artificial intelligence models impede their deploy-ability on edge computing devices of the Internet of Things(IoT).While Power-of-Two(PoT)quantization is pro-posed to improve the efficiency for edge inference of Deep Neural Networks(DNNs),existing PoT schemes require a huge amount of bit-wise manipulation and have large memory overhead,and their efficiency is bounded by the bottleneck of computation latency and memory footprint.To tackle this challenge,we present an efficient inference approach on the basis of PoT quantization and model compression.An integer-only scalar PoT quantization(IOS-PoT)is designed jointly with a distribution loss regularizer,wherein the regularizer minimizes quantization errors and training disturbances.Additionally,two-stage model compression is developed to effectively reduce memory requirement,and alleviate bandwidth usage in communications of networked heterogenous learning systems.The product look-up table(P-LUT)inference scheme is leveraged to replace bit-shifting with only indexing and addition operations for achieving low-latency computation and implementing efficient edge accelerators.Finally,comprehensive experiments on Residual Networks(ResNets)and efficient architectures with Canadian Institute for Advanced Research(CIFAR),ImageNet,and Real-world Affective Faces Database(RAF-DB)datasets,indicate that our approach achieves 2×∼10×improvement in the reduction of both weight size and computation cost in comparison to state-of-the-art methods.A P-LUT accelerator prototype is implemented on the Xilinx KV260 Field Programmable Gate Array(FPGA)platform for accelerating convolution operations,with performance results showing that P-LUT reduces memory footprint by 1.45×,achieves more than 3×power efficiency and 2×resource efficiency,compared to the conventional bit-shifting scheme.展开更多
Quantized training has been proven to be a prominent method to achieve deep neural network training under limited computational resources.It uses low bit-width arithmetics with a proper scaling factor to achieve negli...Quantized training has been proven to be a prominent method to achieve deep neural network training under limited computational resources.It uses low bit-width arithmetics with a proper scaling factor to achieve negligible accuracy loss.Cambricon-Q is the ASIC design proposed to efficiently support quantized training,and achieves significant performance improvement.However,there are still two caveats in the design.First,Cambricon-Q with different hardware specifications may lead to different numerical errors,resulting in non-reproducible behaviors which may become a major concern in critical applications.Second,Cambricon-Q cannot leverage data sparsity,where considerable cycles could still be squeezed out.To address the caveats,the acceleration core of Cambricon-Q is redesigned to support fine-grained irregular data processing.The new design not only enables acceleration on sparse data,but also enables performing local dynamic quantization by contiguous value ranges(which is hardware independent),instead of contiguous addresses(which is dependent on hardware factors).Experimental results show that the accuracy loss of the method still keeps negligible,and the accelerator achieves 1.61×performance improvement over Cambricon-Q,with about 10%energy increase.展开更多
The Moon provides a unique environment for investigating nearby astrophysical events such as supernovae.Lunar samples retain valuable information from these events,via detectable long-lived“fingerprint”radionuclides...The Moon provides a unique environment for investigating nearby astrophysical events such as supernovae.Lunar samples retain valuable information from these events,via detectable long-lived“fingerprint”radionuclides such as^(60)Fe.In this work,we stepped up the development of an accelerator mass spectrometry(AMS)method for detecting^(60)Fe using the HI-13tandem accelerator at the China Institute of Atomic Energy(CIAE).Since interferences could not be sufficiently removed solely with the existing magnetic systems of the tandem accelerator and the following Q3D magnetic spectrograph,a Wien filter with a maximum voltage of±60 kV and a maximum magnetic field of 0.3 T was installed after the accelerator magnetic systems to lower the detection background for the low abundance nuclide^(60)Fe.A 1μm thick Si_(3)N_(4) foil was installed in front of the Q3D as an energy degrader.For particle detection,a multi-anode gas ionization chamber was mounted at the center of the focal plane of the spectrograph.Finally,an^(60)Fe sample with an abundance of 1.125×10^(-10)was used to test the new AMS system.These results indicate that^(60)Fe can be clearly distinguished from the isobar^(60)Ni.The sensitivity was assessed to be better than 4.3×10^(-14)based on blank sample measurements lasting 5.8 h,and the sensitivity could,in principle,be expected to be approximately 2.5×10^(-15)when the data were accumulated for 100 h,which is feasible for future lunar sample measurements because the main contaminants were sufficiently separated.展开更多
Prompt radiation emitted during accelerator operation poses a significant health risk,necessitating a thorough search and securing of hazardous areas prior to initiation.Currently,manual sweep methods are employed.How...Prompt radiation emitted during accelerator operation poses a significant health risk,necessitating a thorough search and securing of hazardous areas prior to initiation.Currently,manual sweep methods are employed.However,the limitations of manual sweeps have become increasingly evident with the implementation of large-scale accelerators.By leveraging advancements in machine vision technology,the automatic identification of stranded personnel in controlled areas through camera imagery presents a viable solution for efficient search and security.Given the criticality of personal safety for stranded individuals,search and security processes must be sufficiently reliable.To ensure comprehensive coverage,180°camera groups were strategically positioned on both sides of the accelerator tunnel to eliminate blind spots within the monitoring range.The YOLOV8 network model was modified to enable the detection of small targets,such as hands and feet,as well as larger targets formed by individuals near the cameras.Furthermore,the system incorporates a pedestrian recognition model that detects human body parts,and an information fusion strategy is used to integrate the detected head,hands,and feet with the identified pedestrians as a cohesive unit.This strategy enhanced the capability of the model to identify pedestrians obstructed by equipment,resulting in a notable improvement in the recall rate.Specifically,recall rates of 0.915 and 0.82were obtained for Datasets 1 and 2,respectively.Although there was a slight decrease in accuracy,it aligned with the intended purpose of the search-and-secure software design.Experimental tests conducted within an accelerator tunnel demonstrated the effectiveness of this approach in achieving reliable recognition outcomes.展开更多
Molecular Dynamics(MD)simulation for computing Interatomic Potential(IAP)is a very important High-Performance Computing(HPC)application.MD simulation on particles of experimental relevance takes huge computation time,...Molecular Dynamics(MD)simulation for computing Interatomic Potential(IAP)is a very important High-Performance Computing(HPC)application.MD simulation on particles of experimental relevance takes huge computation time,despite using an expensive high-end server.Heterogeneous computing,a combination of the Field Programmable Gate Array(FPGA)and a computer,is proposed as a solution to compute MD simulation efficiently.In such heterogeneous computation,communication between FPGA and Computer is necessary.One such MD simulation,explained in the paper,is the(Artificial Neural Network)ANN-based IAP computation of gold(Au_(147)&Au_(309))nanoparticles.MD simulation calculates the forces between atoms and the total energy of the chemical system.This work proposes the novel design and implementation of an ANN IAP-based MD simulation for Au_(147)&Au_(309) using communication protocols,such as Universal Asynchronous Receiver-Transmitter(UART)and Ethernet,for communication between the FPGA and the host computer.To improve the latency of MD simulation through heterogeneous computing,Universal Asynchronous Receiver-Transmitter(UART)and Ethernet communication protocols were explored to conduct MD simulation of 50,000 cycles.In this study,computation times of 17.54 and 18.70 h were achieved with UART and Ethernet,respectively,compared to the conventional server time of 29 h for Au_(147) nanoparticles.The results pave the way for the development of a Lab-on-a-chip application.展开更多
With the rapid development of deep learning algorithms,the computational complexity and functional diversity are increasing rapidly.However,the gap between high computational density and insufficient memory bandwidth ...With the rapid development of deep learning algorithms,the computational complexity and functional diversity are increasing rapidly.However,the gap between high computational density and insufficient memory bandwidth under the traditional von Neumann architecture is getting worse.Analyzing the algorithmic characteristics of convolutional neural network(CNN),it is found that the access characteristics of convolution(CONV)and fully connected(FC)operations are very different.Based on this feature,a dual-mode reronfigurable distributed memory architecture for CNN accelerator is designed.It can be configured in Bank mode or first input first output(FIFO)mode to accommodate the access needs of different operations.At the same time,a programmable memory control unit is designed,which can effectively control the dual-mode configurable distributed memory architecture by using customized special accessing instructions and reduce the data accessing delay.The proposed architecture is verified and tested by parallel implementation of some CNN algorithms.The experimental results show that the peak bandwidth can reach 13.44 GB·s^(-1)at an operating frequency of 120 MHz.This work can achieve 1.40,1.12,2.80 and 4.70 times the peak bandwidth compared with the existing work.展开更多
The application of a thermoluminescent detector(TLD) for dose detection at the liver irradiation site in mice under linear accelerator precision radiotherapy and the use of a single high dose to irradiate the mouse li...The application of a thermoluminescent detector(TLD) for dose detection at the liver irradiation site in mice under linear accelerator precision radiotherapy and the use of a single high dose to irradiate the mouse liver to construct a biological model of a radiation-induced liver injury(RILD) in mice were to determine the feasibility of constructing a precision radiotherapy model in small animals under a linear accelerator. A 360° arc volumetric rotational intensity-modulated radiotherapy(VMAT) plan with a prescribed dose of 2 Gy was developed for the planned target volume(PTV) at the location of the TLD within solid water to compare the difference between the measured dose of TLD and the assessed parameters in the TPS system. The TLD was implanted in the livers of mice, and VMAT was planned based on TLD to compare the measured and prescribed doses. C57BL/6 J mice were randomly divided into control and 25-Gy radiation groups and were examined daily for changes in body weight. They were euthanized at 3 and 10 weeks after radiation, and the levels of liver serum enzymes such as alanine aminotransferase(ALT), aspartate aminotransferase(AST), and alkaline phosphatase(ALP) were measured to observe any pathological histological changes in the irradiated areas of the mouse liver. The measured values of solid underwater TLD were within ± 3% of the Dmean value of the evaluation parameter in the TPS system. The mice in the 25-Gy radiation group demonstrated pathological signs of radiation-induced liver injury at the site of liver irradiation. The deviation in the measured and prescribed doses of TLD in the mouse liver ranged from-1.5 to 6%;construction of an accurate model of RILD using the VMAT technique under a linear accelerator is feasible.展开更多
With the rapid development and popularization of artificial intelligence technology,convolutional neural network(CNN)is applied in many fields,and begins to replace most traditional algorithms and gradually deploys to...With the rapid development and popularization of artificial intelligence technology,convolutional neural network(CNN)is applied in many fields,and begins to replace most traditional algorithms and gradually deploys to terminal devices.However,the huge data movement and computational complexity of CNN bring huge power consumption and performance challenges to the hardware,which hinders the application of CNN in embedded devices such as smartphones and smart cars.This paper implements a convolutional neural network accelerator based on Winograd convolution algorithm on field-programmable gate array(FPGA).Firstly,a convolution kernel decomposition method for Winograd convolution is proposed.The convolution kernel larger than 3×3 is divided into multiple 3×3 convolution kernels for convolution operation,and the unsynchronized long convolution operation is processed.Then,we design Winograd convolution array and use configurable multiplier to flexibly realize multiplication for data with different accuracy.Experimental results on VGG16 and AlexNet network show that our accelerator has the most energy efficient and 101 times that of the CPU,5.8 times that of the GPU.At the same time,it has higher energy efficiency than other convolutional neural network accelerators.展开更多
The China Fusion Engineering Test Reactor plans to build a 200 k V/25 A acceleration grid power supply(AGPS)for the negative-ion-based neutral beam injector prototype system.The AGPS uses a rectifier-inverter-isolated...The China Fusion Engineering Test Reactor plans to build a 200 k V/25 A acceleration grid power supply(AGPS)for the negative-ion-based neutral beam injector prototype system.The AGPS uses a rectifier-inverter-isolated step-up structure.There is a DC bus between the rectifier and the inverter.In order to limit DC bus voltage ripple and transient fluctuations,a large number of capacitors are used,which degrades the reliability of the power supply and occupies a large amount of space.This work finds that due to the difference in the turn-off time of the rectifier and the inverter,the capacitance mainly depends on the rectifier current when the inverter is turned off.On this basis,an active power filter(APF)scheme is proposed to absorb the current.To enhance the dynamic response ability of the APF,model predictive control is adopted.In this paper,the circuit structure of the APF is introduced,the prediction model is deduced,the corresponding control strategy and signal detection method are proposed,and the simulation and experimental results show that APF can track the transient current of the DC bus and reduce the voltage fluctuation significantly.展开更多
The dielectric laser accelerator(DLA) is a promising technology for achieving high-gradient acceleration in a compact design. Its advantages include ease of cascading and an energy gain per unit distance which can exc...The dielectric laser accelerator(DLA) is a promising technology for achieving high-gradient acceleration in a compact design. Its advantages include ease of cascading and an energy gain per unit distance which can exceed that of conventional accelerators by two orders of magnitude. This paper establishes rules for efficient particle acceleration using dielectric structures based on basic equations, proposes a design principle for DLA structures with clear physical images and verifies the accuracy of the corresponding formula for energy gain. DLA structures with different specifications, materials and geometric shapes are constructed, and the achievable acceleration gradient is calculated. Our results demonstrate that effective acceleration can be achieved when the electric field sensed by particles in the acceleration cavity has zero frequency,which provides a powerful method for designing such devices. Furthermore, we demonstrate that the simplified formula for calculating energy gain presented in this paper can accurately determine the energy gain of particles during the design of acceleration structures using a dielectric accelerator.展开更多
Dielectric laser accelerators(DLAs)are considered promising candidates for on-chip particle accelerators that can achieve high acceleration gradients.This study explores various combinations of dielectric materials an...Dielectric laser accelerators(DLAs)are considered promising candidates for on-chip particle accelerators that can achieve high acceleration gradients.This study explores various combinations of dielectric materials and accelerated structures based on the inverse Cherenkov effect.The designs utilize conventional processing methods and laser parameters currently in use.We optimize the structural model to enhance the gradient of acceleration and the electron energy gain.To achieve higher acceleration gradients and energy gains,the selection of materials and structures should be based on the initial electron energy.Furthermore,we observed that the variation of the acceleration gradient of the material is different at different initial electron energies.These findings suggest that on-chip accelerators are feasible with the help of these structures and materials.展开更多
In this paper,we propose a novel stacked laser dielectric acceleration structure.This structure is based on the inverse Cherenkov effect and represented by a parametric design formulation.Compared to existing dielectr...In this paper,we propose a novel stacked laser dielectric acceleration structure.This structure is based on the inverse Cherenkov effect and represented by a parametric design formulation.Compared to existing dielectric laser accelerators relying on the inverse Smith–Purcell effect,the proposed structure provides an extended-duration synchronous acceleration field without requiring the pulse front tilting technique.This advantage significantly reduces the required pulse duration.In addition,the easy to integrate layered structure facilitates cascade acceleration,and simulations have shown that low-energy electron beams can be cascaded through high gradients over extended distances.These practical advantages demonstrate the potential of this new structure for future chip accelerators.展开更多
This paper presents the architecture of a Convolution Neural Network(CNN)accelerator based on a newprocessing element(PE)array called a diagonal cyclic array(DCA).As demonstrated,it can significantly reduce the burden...This paper presents the architecture of a Convolution Neural Network(CNN)accelerator based on a newprocessing element(PE)array called a diagonal cyclic array(DCA).As demonstrated,it can significantly reduce the burden of repeated memory accesses for feature data and weight parameters of the CNN models,which maximizes the data reuse rate and improve the computation speed.Furthermore,an integrated computation architecture has been implemented for the activation function,max-pooling,and activation function after convolution calculation,reducing the hardware resource.To evaluate the effectiveness of the proposed architecture,a CNN accelerator has been implemented for You Only Look Once version 2(YOLOv2)-Tiny consisting of 9 layers.Furthermore,the methodology to optimize the local buffer size with little sacrifice of inference speed is presented in this work.We implemented the proposed CNN accelerator using a Xilinx Zynq ZCU102 Ultrascale+Field Programmable Gate Array(FPGA)and ISE Design Suite.The FPGA implementation uses 34,336 Look Up Tables(LUTs),576 Digital Signal Processing(DSP)blocks,and an on-chip memory of only 58 KB,and it could achieve accuracies of 57.92% and 56.42% mean Average Precession@0.5 thresholds for intersection over union(mAP@0.5)using quantized 16-bit and 8-bit full integer data manipulation with only 0.68% as a loss for 8-bit version and computation time of 137.9 and 69 ms for each input image respectively using a clock speed of 200 MHz.These speeds are expected to be doubled five times using a clock speed of 1GHz if implemented in a silicon System on Chip(SoC)using a sub-micron process.展开更多
In this study,an X-band standing-wave biperiodic linear accelerator was developed for medical radiotherapy that can accel-erate electrons to 9 MeV using a 2.4-MW klystron.The structure works atπ/2 mode and adopts mag...In this study,an X-band standing-wave biperiodic linear accelerator was developed for medical radiotherapy that can accel-erate electrons to 9 MeV using a 2.4-MW klystron.The structure works atπ/2 mode and adopts magnetic coupling between cavities,generating the appropriate adjacent mode separation of 10 MHz.The accelerator is less than 600-mm long and constitutes four bunching cells and 29 normal cells.Geometry optimizations,full-scale radiofrequency(RF)simulations,and beam dynamics calculations were performed.The accelerator was fabricated and examined using a low-power RF test.The cold test results showed a good agreement with the simulation and actual measurement results.In the high-power RF test,the output beam current,energy spectrum,capture ratio,and spot size at the accelerator exit were measured.With the input power of 2.4 MW,the pulse current was 100 mA,and the output spot root-mean-square radius was approximately 0.5 mm.The output kinetic energy was 9.04 MeV with the spectral FWHM of 3.5%,demonstrating the good performance of this accelerator.展开更多
With the continuous development of deep learning,Deep Convolutional Neural Network(DCNN)has attracted wide attention in the industry due to its high accuracy in image classification.Compared with other DCNN hard-ware ...With the continuous development of deep learning,Deep Convolutional Neural Network(DCNN)has attracted wide attention in the industry due to its high accuracy in image classification.Compared with other DCNN hard-ware deployment platforms,Field Programmable Gate Array(FPGA)has the advantages of being programmable,low power consumption,parallelism,and low cost.However,the enormous amount of calculation of DCNN and the limited logic capacity of FPGA restrict the energy efficiency of the DCNN accelerator.The traditional sequential sliding window method can improve the throughput of the DCNN accelerator by data multiplexing,but this method’s data multiplexing rate is low because it repeatedly reads the data between rows.This paper proposes a fast data readout strategy via the circular sliding window data reading method,it can improve the multiplexing rate of data between rows by optimizing the memory access order of input data.In addition,the multiplication bit width of the DCNN accelerator is much smaller than that of the Digital Signal Processing(DSP)on the FPGA,which means that there will be a waste of resources if a multiplication uses a single DSP.A multiplier sharing strategy is proposed,the multiplier of the accelerator is customized so that a single DSP block can complete multiple groups of 4,6,and 8-bit signed multiplication in parallel.Finally,based on two strategies of appeal,an FPGA optimized accelerator is proposed.The accelerator is customized by Verilog language and deployed on Xilinx VCU118.When the accelerator recognizes the CIRFAR-10 dataset,its energy efficiency is 39.98 GOPS/W,which provides 1.73×speedup energy efficiency over previous DCNN FPGA accelerators.When the accelerator recognizes the IMAGENET dataset,its energy efficiency is 41.12 GOPS/W,which shows 1.28×−3.14×energy efficiency compared with others.展开更多
With the increasing demand of computational power in artificial intelligence(AI)algorithms,dedicated accelerators have become a necessity.However,the complexity of hardware architectures,vast design search space,and c...With the increasing demand of computational power in artificial intelligence(AI)algorithms,dedicated accelerators have become a necessity.However,the complexity of hardware architectures,vast design search space,and complex tasks of accelerators have posed significant challenges.Tra-ditional search methods can become prohibitively slow if the search space continues to be expanded.A design space exploration(DSE)method is proposed based on transfer learning,which reduces the time for repeated training and uses multi-task models for different tasks on the same processor.The proposed method accurately predicts the latency and energy consumption associated with neural net-work accelerator design parameters,enabling faster identification of optimal outcomes compared with traditional methods.And compared with other DSE methods by using multilayer perceptron(MLP),the required training time is shorter.Comparative experiments with other methods demonstrate that the proposed method improves the efficiency of DSE without compromising the accuracy of the re-sults.展开更多
基金supported by the Air Force Office of Scientific Research Grant No.FA9550-17-1-0264supported by the DOE,Office of Science,Fusion Energy Sciences under Contract No.DE-SC0021125+2 种基金supported by the U.S.Department of Energy Grant No.DESC0011617.D.A.Jarozynski,E.Brunetti,B.Ersfeld,and S.Yoffe would like to acknowledge support from the U.K.EPSRC(Grant Nos.EP/J018171/1 and EP/N028694/1)the European Union’s Horizon 2020 research and innovation program under Grant Agreement No.871124 Laserlab-Europe and EuPRAXIA(Grant No.653782)funded by the N8 research partnership and EPSRC(Grant No.EP/T022167/1).
文摘An intense laser pulse focused onto a plasma can excite nonlinear plasma waves.Under appropriate conditions,electrons from the background plasma are trapped in the plasma wave and accelerated to ultra-relativistic velocities.This scheme is called a laser wakefield accelerator.In this work,we present results from a laser wakefield acceleration experiment using a petawatt-class laser to excite the wakefields as well as nanoparticles to assist the injection of electrons into the accelerating phase of the wakefields.We find that a 10-cm-long,nanoparticle-assisted laser wakefield accelerator can generate 340 pC,10±1.86 GeV electron bunches with a 3.4 GeV rms convolved energy spread and a 0.9 mrad rms divergence.It can also produce bunches with lower energies in the 4–6 GeV range.
基金supported by the National Natural Science Foundation of China(Grant No.11975214).
文摘We present a first on-chip positron accelerator based on dielectric laser acceleration.This innovative approach significantly reduces the physical dimensions of the positron acceleration apparatus,enhancing its feasibility for diverse applications.By utilizing a stacked acceleration structure and far-infrared laser technology,we are able to achieve a seven-stage acceleration structure that surpasses the distance and energy gain of using the previous dielectric laser acceleration methods.Additionally,we are able to compress the positron beam to an ultrafast sub-femtosecond scale during the acceleration process,compared with the traditional methods,the positron beam is compressed to a greater extent.We also demonstrate the robustness of the stacked acceleration structure through the successful acceleration of the positron beam.
文摘The flexibility in radiotherapy can be improved if patients can be moved between any one of the department’s medical linear accelerators (LINACs) without the need to change anything in the patient’s treatment plan. For this to be possible, the dosimetric characteristics of the various accelerators must be the same, or nearly the same. The purpose of this work is to describe further and compare measurements and parameters after the initial vendor-recommended beam matching of the five LINACs. Deviations related to dose calculations and to beam matched accelerators may compromise treatment accuracy. The safest and most practical way to ensure that all accelerators are within clinical acceptable accuracy is to include TPS calculations in the LINACs matching evaluation. Treatment planning system (TPS) was used to create three photons plans with different field sizes 3 × 3 cm, 10 × 10 cm and 25 × 25 cm at a depth of 4.5 cm in Perspex. Calculated TPS plans were sent to Mosaiq to be delivered by five LINACs. TPS plans were compared with five LINACs measurements data using Gamma analyses of 2% and 2 mm. The results suggest that for four out of the five LINACs, there was generally good agreement, less than a 2% deviation between the planned dose distribution and the measured dose distribution. However, one specific LINAC named “Asterix” exhibited a deviation of 2.121% from the planned dose. The results show that all of the LINACs’ performance were within the acceptable deviation and delivering radiation dose consistently and accurately.
基金Supported by National Key R&D Program of China(2019YFA0405400)。
文摘In recent years,heavy ion accelerator technology has been rapidly developing worldwide and widely applied in the fields of space radiation simulation and particle therapy.Usually,a very high uniformity in the irradiation area is required for the extracted ion beams,which is crucial because it directly affects the experimental precision and therapeutic effect.Specifically,ultra-large-area and high-uniformity scanning are crucial requirements for spacecraft radiation effects assessment and serve as core specification for beamline terminal design.In the 300 MeV proton and heavy ion accelerator complex at the Space Environment Simulation and Research Infrastructure(SESRI),proton and heavy ion beams will be accelerated and ultimately delivered to three irradiation terminals.In order to achieve the required large irradiation area of 320 mm×320 mm,horizontal and vertical scanning magnets are used in the extraction beam line.However,considering the various requirements for beam species and energies,the tracking accuracy of power supplies(PSs),the eddy current effect of scanning magnets,and the fluctuation of ion bunch structure will reduce the irradiation uniformity.To mitigate these effects,a beam uniformity optimization method based on the measured beam distribution was proposed and applied in the accelerator complex at SESRI.In the experiment,the uniformity is successfully optimized from 75%to over 90%after five iterations of adjustment to the PS waveforms.In this paper,the method and experimental results were introduced.
基金This work was supported by Open Fund Project of State Key Laboratory of Intelligent Vehicle Safety Technology by Grant with No.IVSTSKL-202311Key Projects of Science and Technology Research Programme of Chongqing Municipal Education Commission by Grant with No.KJZD-K202301505+1 种基金Cooperation Project between Chongqing Municipal Undergraduate Universities and Institutes Affiliated to the Chinese Academy of Sciences in 2021 by Grant with No.HZ2021015Chongqing Graduate Student Research Innovation Program by Grant with No.CYS240801.
文摘Massive computational complexity and memory requirement of artificial intelligence models impede their deploy-ability on edge computing devices of the Internet of Things(IoT).While Power-of-Two(PoT)quantization is pro-posed to improve the efficiency for edge inference of Deep Neural Networks(DNNs),existing PoT schemes require a huge amount of bit-wise manipulation and have large memory overhead,and their efficiency is bounded by the bottleneck of computation latency and memory footprint.To tackle this challenge,we present an efficient inference approach on the basis of PoT quantization and model compression.An integer-only scalar PoT quantization(IOS-PoT)is designed jointly with a distribution loss regularizer,wherein the regularizer minimizes quantization errors and training disturbances.Additionally,two-stage model compression is developed to effectively reduce memory requirement,and alleviate bandwidth usage in communications of networked heterogenous learning systems.The product look-up table(P-LUT)inference scheme is leveraged to replace bit-shifting with only indexing and addition operations for achieving low-latency computation and implementing efficient edge accelerators.Finally,comprehensive experiments on Residual Networks(ResNets)and efficient architectures with Canadian Institute for Advanced Research(CIFAR),ImageNet,and Real-world Affective Faces Database(RAF-DB)datasets,indicate that our approach achieves 2×∼10×improvement in the reduction of both weight size and computation cost in comparison to state-of-the-art methods.A P-LUT accelerator prototype is implemented on the Xilinx KV260 Field Programmable Gate Array(FPGA)platform for accelerating convolution operations,with performance results showing that P-LUT reduces memory footprint by 1.45×,achieves more than 3×power efficiency and 2×resource efficiency,compared to the conventional bit-shifting scheme.
基金the National Key Research and Devecopment Program of China(No.2022YFB4501601)the National Natural Science Foundation of China(No.62102398,U20A20227,62222214,62002338,U22A2028,U19B2019)+1 种基金the Chinese Academy of Sciences Project for Young Scientists in Basic Research(YSBR-029)Youth Innovation Promotion Association Chinese Academy of Sciences。
文摘Quantized training has been proven to be a prominent method to achieve deep neural network training under limited computational resources.It uses low bit-width arithmetics with a proper scaling factor to achieve negligible accuracy loss.Cambricon-Q is the ASIC design proposed to efficiently support quantized training,and achieves significant performance improvement.However,there are still two caveats in the design.First,Cambricon-Q with different hardware specifications may lead to different numerical errors,resulting in non-reproducible behaviors which may become a major concern in critical applications.Second,Cambricon-Q cannot leverage data sparsity,where considerable cycles could still be squeezed out.To address the caveats,the acceleration core of Cambricon-Q is redesigned to support fine-grained irregular data processing.The new design not only enables acceleration on sparse data,but also enables performing local dynamic quantization by contiguous value ranges(which is hardware independent),instead of contiguous addresses(which is dependent on hardware factors).Experimental results show that the accuracy loss of the method still keeps negligible,and the accelerator achieves 1.61×performance improvement over Cambricon-Q,with about 10%energy increase.
基金supported by the National Natural Science Foundation of China(Nos.12125509,12222514,11961141003,and 12005304)National Key Research and Development Project(No.2022YFA1602301)+1 种基金CAST Young Talent Support Planthe CNNC Science Fund for Talented Young Scholars Continuous support for basic scientific research projects。
文摘The Moon provides a unique environment for investigating nearby astrophysical events such as supernovae.Lunar samples retain valuable information from these events,via detectable long-lived“fingerprint”radionuclides such as^(60)Fe.In this work,we stepped up the development of an accelerator mass spectrometry(AMS)method for detecting^(60)Fe using the HI-13tandem accelerator at the China Institute of Atomic Energy(CIAE).Since interferences could not be sufficiently removed solely with the existing magnetic systems of the tandem accelerator and the following Q3D magnetic spectrograph,a Wien filter with a maximum voltage of±60 kV and a maximum magnetic field of 0.3 T was installed after the accelerator magnetic systems to lower the detection background for the low abundance nuclide^(60)Fe.A 1μm thick Si_(3)N_(4) foil was installed in front of the Q3D as an energy degrader.For particle detection,a multi-anode gas ionization chamber was mounted at the center of the focal plane of the spectrograph.Finally,an^(60)Fe sample with an abundance of 1.125×10^(-10)was used to test the new AMS system.These results indicate that^(60)Fe can be clearly distinguished from the isobar^(60)Ni.The sensitivity was assessed to be better than 4.3×10^(-14)based on blank sample measurements lasting 5.8 h,and the sensitivity could,in principle,be expected to be approximately 2.5×10^(-15)when the data were accumulated for 100 h,which is feasible for future lunar sample measurements because the main contaminants were sufficiently separated.
文摘Prompt radiation emitted during accelerator operation poses a significant health risk,necessitating a thorough search and securing of hazardous areas prior to initiation.Currently,manual sweep methods are employed.However,the limitations of manual sweeps have become increasingly evident with the implementation of large-scale accelerators.By leveraging advancements in machine vision technology,the automatic identification of stranded personnel in controlled areas through camera imagery presents a viable solution for efficient search and security.Given the criticality of personal safety for stranded individuals,search and security processes must be sufficiently reliable.To ensure comprehensive coverage,180°camera groups were strategically positioned on both sides of the accelerator tunnel to eliminate blind spots within the monitoring range.The YOLOV8 network model was modified to enable the detection of small targets,such as hands and feet,as well as larger targets formed by individuals near the cameras.Furthermore,the system incorporates a pedestrian recognition model that detects human body parts,and an information fusion strategy is used to integrate the detected head,hands,and feet with the identified pedestrians as a cohesive unit.This strategy enhanced the capability of the model to identify pedestrians obstructed by equipment,resulting in a notable improvement in the recall rate.Specifically,recall rates of 0.915 and 0.82were obtained for Datasets 1 and 2,respectively.Although there was a slight decrease in accuracy,it aligned with the intended purpose of the search-and-secure software design.Experimental tests conducted within an accelerator tunnel demonstrated the effectiveness of this approach in achieving reliable recognition outcomes.
文摘Molecular Dynamics(MD)simulation for computing Interatomic Potential(IAP)is a very important High-Performance Computing(HPC)application.MD simulation on particles of experimental relevance takes huge computation time,despite using an expensive high-end server.Heterogeneous computing,a combination of the Field Programmable Gate Array(FPGA)and a computer,is proposed as a solution to compute MD simulation efficiently.In such heterogeneous computation,communication between FPGA and Computer is necessary.One such MD simulation,explained in the paper,is the(Artificial Neural Network)ANN-based IAP computation of gold(Au_(147)&Au_(309))nanoparticles.MD simulation calculates the forces between atoms and the total energy of the chemical system.This work proposes the novel design and implementation of an ANN IAP-based MD simulation for Au_(147)&Au_(309) using communication protocols,such as Universal Asynchronous Receiver-Transmitter(UART)and Ethernet,for communication between the FPGA and the host computer.To improve the latency of MD simulation through heterogeneous computing,Universal Asynchronous Receiver-Transmitter(UART)and Ethernet communication protocols were explored to conduct MD simulation of 50,000 cycles.In this study,computation times of 17.54 and 18.70 h were achieved with UART and Ethernet,respectively,compared to the conventional server time of 29 h for Au_(147) nanoparticles.The results pave the way for the development of a Lab-on-a-chip application.
基金Supported by the National Key R&D Program of China(No.2022ZD0119001)the National Natural Science Foundation of China(No.61834005,61802304)+1 种基金the Education Department of Shaanxi Province(No.22JY060)the Shaanxi Provincial Key Research and Devel-opment Plan(No.2024GX-YBXM-100)。
文摘With the rapid development of deep learning algorithms,the computational complexity and functional diversity are increasing rapidly.However,the gap between high computational density and insufficient memory bandwidth under the traditional von Neumann architecture is getting worse.Analyzing the algorithmic characteristics of convolutional neural network(CNN),it is found that the access characteristics of convolution(CONV)and fully connected(FC)operations are very different.Based on this feature,a dual-mode reronfigurable distributed memory architecture for CNN accelerator is designed.It can be configured in Bank mode or first input first output(FIFO)mode to accommodate the access needs of different operations.At the same time,a programmable memory control unit is designed,which can effectively control the dual-mode configurable distributed memory architecture by using customized special accessing instructions and reduce the data accessing delay.The proposed architecture is verified and tested by parallel implementation of some CNN algorithms.The experimental results show that the peak bandwidth can reach 13.44 GB·s^(-1)at an operating frequency of 120 MHz.This work can achieve 1.40,1.12,2.80 and 4.70 times the peak bandwidth compared with the existing work.
基金supported by the Natural Science Foundation of Anhui Province (No.2208085MA13)Wu Je Ping Medical Foundation (No.320.6750.2020-10-40)the Key Research and Development Program of Anhui Province (No.202004J07020052)。
文摘The application of a thermoluminescent detector(TLD) for dose detection at the liver irradiation site in mice under linear accelerator precision radiotherapy and the use of a single high dose to irradiate the mouse liver to construct a biological model of a radiation-induced liver injury(RILD) in mice were to determine the feasibility of constructing a precision radiotherapy model in small animals under a linear accelerator. A 360° arc volumetric rotational intensity-modulated radiotherapy(VMAT) plan with a prescribed dose of 2 Gy was developed for the planned target volume(PTV) at the location of the TLD within solid water to compare the difference between the measured dose of TLD and the assessed parameters in the TPS system. The TLD was implanted in the livers of mice, and VMAT was planned based on TLD to compare the measured and prescribed doses. C57BL/6 J mice were randomly divided into control and 25-Gy radiation groups and were examined daily for changes in body weight. They were euthanized at 3 and 10 weeks after radiation, and the levels of liver serum enzymes such as alanine aminotransferase(ALT), aspartate aminotransferase(AST), and alkaline phosphatase(ALP) were measured to observe any pathological histological changes in the irradiated areas of the mouse liver. The measured values of solid underwater TLD were within ± 3% of the Dmean value of the evaluation parameter in the TPS system. The mice in the 25-Gy radiation group demonstrated pathological signs of radiation-induced liver injury at the site of liver irradiation. The deviation in the measured and prescribed doses of TLD in the mouse liver ranged from-1.5 to 6%;construction of an accurate model of RILD using the VMAT technique under a linear accelerator is feasible.
基金supported by the Project of the State Grid Corporation of China in 2022(No.5700-201941501A-0-0-00)the National Natural Science Foundation of China(No.U21B2031).
文摘With the rapid development and popularization of artificial intelligence technology,convolutional neural network(CNN)is applied in many fields,and begins to replace most traditional algorithms and gradually deploys to terminal devices.However,the huge data movement and computational complexity of CNN bring huge power consumption and performance challenges to the hardware,which hinders the application of CNN in embedded devices such as smartphones and smart cars.This paper implements a convolutional neural network accelerator based on Winograd convolution algorithm on field-programmable gate array(FPGA).Firstly,a convolution kernel decomposition method for Winograd convolution is proposed.The convolution kernel larger than 3×3 is divided into multiple 3×3 convolution kernels for convolution operation,and the unsynchronized long convolution operation is processed.Then,we design Winograd convolution array and use configurable multiplier to flexibly realize multiplication for data with different accuracy.Experimental results on VGG16 and AlexNet network show that our accelerator has the most energy efficient and 101 times that of the CPU,5.8 times that of the GPU.At the same time,it has higher energy efficiency than other convolutional neural network accelerators.
基金supported in part by the National Key Research and Development Program of China(No.2017YFE0300104)in part by National Natural Science Foundation of China(No.51821005)。
文摘The China Fusion Engineering Test Reactor plans to build a 200 k V/25 A acceleration grid power supply(AGPS)for the negative-ion-based neutral beam injector prototype system.The AGPS uses a rectifier-inverter-isolated step-up structure.There is a DC bus between the rectifier and the inverter.In order to limit DC bus voltage ripple and transient fluctuations,a large number of capacitors are used,which degrades the reliability of the power supply and occupies a large amount of space.This work finds that due to the difference in the turn-off time of the rectifier and the inverter,the capacitance mainly depends on the rectifier current when the inverter is turned off.On this basis,an active power filter(APF)scheme is proposed to absorb the current.To enhance the dynamic response ability of the APF,model predictive control is adopted.In this paper,the circuit structure of the APF is introduced,the prediction model is deduced,the corresponding control strategy and signal detection method are proposed,and the simulation and experimental results show that APF can track the transient current of the DC bus and reduce the voltage fluctuation significantly.
基金Project supported by the National Natural Science Foundation of China (Grant No. 11975214)。
文摘The dielectric laser accelerator(DLA) is a promising technology for achieving high-gradient acceleration in a compact design. Its advantages include ease of cascading and an energy gain per unit distance which can exceed that of conventional accelerators by two orders of magnitude. This paper establishes rules for efficient particle acceleration using dielectric structures based on basic equations, proposes a design principle for DLA structures with clear physical images and verifies the accuracy of the corresponding formula for energy gain. DLA structures with different specifications, materials and geometric shapes are constructed, and the achievable acceleration gradient is calculated. Our results demonstrate that effective acceleration can be achieved when the electric field sensed by particles in the acceleration cavity has zero frequency,which provides a powerful method for designing such devices. Furthermore, we demonstrate that the simplified formula for calculating energy gain presented in this paper can accurately determine the energy gain of particles during the design of acceleration structures using a dielectric accelerator.
基金the National Natural Science Foundation of China(Grant No.11975214)。
文摘Dielectric laser accelerators(DLAs)are considered promising candidates for on-chip particle accelerators that can achieve high acceleration gradients.This study explores various combinations of dielectric materials and accelerated structures based on the inverse Cherenkov effect.The designs utilize conventional processing methods and laser parameters currently in use.We optimize the structural model to enhance the gradient of acceleration and the electron energy gain.To achieve higher acceleration gradients and energy gains,the selection of materials and structures should be based on the initial electron energy.Furthermore,we observed that the variation of the acceleration gradient of the material is different at different initial electron energies.These findings suggest that on-chip accelerators are feasible with the help of these structures and materials.
基金the National Natural Science Foundation of China(Nos.12004353,11975214,11991071,11905202,and 12174350)Key Laboratory Foundation of the Sciences and Technology on Plasma Physics Laboratory(No.6142A04200103)Independent Scientific Research(No.JCKYS2021212011).
文摘In this paper,we propose a novel stacked laser dielectric acceleration structure.This structure is based on the inverse Cherenkov effect and represented by a parametric design formulation.Compared to existing dielectric laser accelerators relying on the inverse Smith–Purcell effect,the proposed structure provides an extended-duration synchronous acceleration field without requiring the pulse front tilting technique.This advantage significantly reduces the required pulse duration.In addition,the easy to integrate layered structure facilitates cascade acceleration,and simulations have shown that low-energy electron beams can be cascaded through high gradients over extended distances.These practical advantages demonstrate the potential of this new structure for future chip accelerators.
基金supported by the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(No.2022R1A5A8026986)supported by the Institute of Information&communications Technology Planning&Evaluation(IITP)grant funded by the Korean government(MSIT)(No.2020-0-01304,Development of Self-learnable Mobile Recursive Neural Network Processor Technology)supported by the MSIT(Ministry of Science and ICT),Korea,under the Grand Information Technology Research Center support program(IITP-2023-2020-0-01462)'supervised by the IITP(Institute for Information&communications Technology Planning&Evaluation)and supported by the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(No.2021R1F1A1061314).
文摘This paper presents the architecture of a Convolution Neural Network(CNN)accelerator based on a newprocessing element(PE)array called a diagonal cyclic array(DCA).As demonstrated,it can significantly reduce the burden of repeated memory accesses for feature data and weight parameters of the CNN models,which maximizes the data reuse rate and improve the computation speed.Furthermore,an integrated computation architecture has been implemented for the activation function,max-pooling,and activation function after convolution calculation,reducing the hardware resource.To evaluate the effectiveness of the proposed architecture,a CNN accelerator has been implemented for You Only Look Once version 2(YOLOv2)-Tiny consisting of 9 layers.Furthermore,the methodology to optimize the local buffer size with little sacrifice of inference speed is presented in this work.We implemented the proposed CNN accelerator using a Xilinx Zynq ZCU102 Ultrascale+Field Programmable Gate Array(FPGA)and ISE Design Suite.The FPGA implementation uses 34,336 Look Up Tables(LUTs),576 Digital Signal Processing(DSP)blocks,and an on-chip memory of only 58 KB,and it could achieve accuracies of 57.92% and 56.42% mean Average Precession@0.5 thresholds for intersection over union(mAP@0.5)using quantized 16-bit and 8-bit full integer data manipulation with only 0.68% as a loss for 8-bit version and computation time of 137.9 and 69 ms for each input image respectively using a clock speed of 200 MHz.These speeds are expected to be doubled five times using a clock speed of 1GHz if implemented in a silicon System on Chip(SoC)using a sub-micron process.
基金the Key R&D Project of the Ministry of Science and Technology of China(No.2022YFC2402300).
文摘In this study,an X-band standing-wave biperiodic linear accelerator was developed for medical radiotherapy that can accel-erate electrons to 9 MeV using a 2.4-MW klystron.The structure works atπ/2 mode and adopts magnetic coupling between cavities,generating the appropriate adjacent mode separation of 10 MHz.The accelerator is less than 600-mm long and constitutes four bunching cells and 29 normal cells.Geometry optimizations,full-scale radiofrequency(RF)simulations,and beam dynamics calculations were performed.The accelerator was fabricated and examined using a low-power RF test.The cold test results showed a good agreement with the simulation and actual measurement results.In the high-power RF test,the output beam current,energy spectrum,capture ratio,and spot size at the accelerator exit were measured.With the input power of 2.4 MW,the pulse current was 100 mA,and the output spot root-mean-square radius was approximately 0.5 mm.The output kinetic energy was 9.04 MeV with the spectral FWHM of 3.5%,demonstrating the good performance of this accelerator.
基金supported in part by the Major Program of the Ministry of Science and Technology of China under Grant 2019YFB2205102in part by the National Natural Science Foundation of China under Grant 61974164,62074166,61804181,62004219,62004220,62104256.
文摘With the continuous development of deep learning,Deep Convolutional Neural Network(DCNN)has attracted wide attention in the industry due to its high accuracy in image classification.Compared with other DCNN hard-ware deployment platforms,Field Programmable Gate Array(FPGA)has the advantages of being programmable,low power consumption,parallelism,and low cost.However,the enormous amount of calculation of DCNN and the limited logic capacity of FPGA restrict the energy efficiency of the DCNN accelerator.The traditional sequential sliding window method can improve the throughput of the DCNN accelerator by data multiplexing,but this method’s data multiplexing rate is low because it repeatedly reads the data between rows.This paper proposes a fast data readout strategy via the circular sliding window data reading method,it can improve the multiplexing rate of data between rows by optimizing the memory access order of input data.In addition,the multiplication bit width of the DCNN accelerator is much smaller than that of the Digital Signal Processing(DSP)on the FPGA,which means that there will be a waste of resources if a multiplication uses a single DSP.A multiplier sharing strategy is proposed,the multiplier of the accelerator is customized so that a single DSP block can complete multiple groups of 4,6,and 8-bit signed multiplication in parallel.Finally,based on two strategies of appeal,an FPGA optimized accelerator is proposed.The accelerator is customized by Verilog language and deployed on Xilinx VCU118.When the accelerator recognizes the CIRFAR-10 dataset,its energy efficiency is 39.98 GOPS/W,which provides 1.73×speedup energy efficiency over previous DCNN FPGA accelerators.When the accelerator recognizes the IMAGENET dataset,its energy efficiency is 41.12 GOPS/W,which shows 1.28×−3.14×energy efficiency compared with others.
基金the National Key R&D Program of China(No.2018AAA0103300)the National Natural Science Foundation of China(No.61925208,U20A20227,U22A2028)+1 种基金the Chinese Academy of Sciences Project for Young Scientists in Basic Research(No.YSBR-029)the Youth Innovation Promotion Association Chinese Academy of Sciences.
文摘With the increasing demand of computational power in artificial intelligence(AI)algorithms,dedicated accelerators have become a necessity.However,the complexity of hardware architectures,vast design search space,and complex tasks of accelerators have posed significant challenges.Tra-ditional search methods can become prohibitively slow if the search space continues to be expanded.A design space exploration(DSE)method is proposed based on transfer learning,which reduces the time for repeated training and uses multi-task models for different tasks on the same processor.The proposed method accurately predicts the latency and energy consumption associated with neural net-work accelerator design parameters,enabling faster identification of optimal outcomes compared with traditional methods.And compared with other DSE methods by using multilayer perceptron(MLP),the required training time is shorter.Comparative experiments with other methods demonstrate that the proposed method improves the efficiency of DSE without compromising the accuracy of the re-sults.