To effectively solve the problems of inconsistent communication protocols in automatic monitoring equipment,and limited data acquisition transmission and monitoring equipment,this paper developed programmable single p...To effectively solve the problems of inconsistent communication protocols in automatic monitoring equipment,and limited data acquisition transmission and monitoring equipment,this paper developed programmable single point multiple output intelligent data acquisition and transmission system. It made an in-depth elaboration of the data acquisition and transmission system from hardware design,software architecture and principle,main functions and technical parameters. Finally,it came up with four innovation points:(i) intelligent(automatic)matching a variety of communication protocols for environmental monitoring equipment,(ii) realizing multi-protocol and multi-target parallel data transmission,(iii) realizing remote dynamic input of control instructions through wired or wireless network,and(iv) supporting configuration(process) simulation of field equipment DCS operating conditions.展开更多
In this work, we explore and study the implication of having more than one output on a genetic programming (GP) graph-representation. This approach, called multiple interactive outputs in a single tree (MIOST), is...In this work, we explore and study the implication of having more than one output on a genetic programming (GP) graph-representation. This approach, called multiple interactive outputs in a single tree (MIOST), is based on two ideas. First, we defined an approach, called interactivity within an individual (IWI), which is based on a graph-GP representation. Second, we add to the individuals created with the IWI approach multiple outputs in their structures and as a result of this, we have MIOST. As a first step, we analyze the effects of IWI by using only mutations and analyze its implications (i.e., presence of neutrality). Then, we continue testing the effectiveness of IWI by allowing mutations and the standard GP crossover in the evolutionary process. Finally, we tested the effectiveness of MIOST by using mutations and crossover and conducted extensive empirical results on different evolvable problems of different complexity taken from the literature. The results reported in this paper indicate that the proposed approach has a better overall performance in terms of consistency reaching feasible solutions.展开更多
Complex systems exist widely,including medicines from natural products,functional foods,and biological samples.The biological activity of complex systems is often the result of the synergistic effect of multiple compo...Complex systems exist widely,including medicines from natural products,functional foods,and biological samples.The biological activity of complex systems is often the result of the synergistic effect of multiple components.In the quality evaluation of complex samples,multicomponent quantitative analysis(MCQA)is usually needed.To overcome the difficulty in obtaining standard products,scholars have proposed achieving MCQA through the“single standard to determine multiple components(SSDMC)”approach.This method has been used in the determination of multicomponent content in natural source drugs and the analysis of impurities in chemical drugs and has been included in the Chinese Pharmacopoeia.Depending on a convenient(ultra)high-performance liquid chromatography method,how can the repeatability and robustness of the MCQA method be improved?How can the chromatography conditions be optimized to improve the number of quantitative components?How can computer software technology be introduced to improve the efficiency of multicomponent analysis(MCA)?These are the key problems that remain to be solved in practical MCQA.First,this review article summarizes the calculation methods of relative correction factors in the SSDMC approach in the past five years,as well as the method robustness and accuracy evaluation.Second,it also summarizes methods to improve peak capacity and quantitative accuracy in MCA,including column selection and twodimensional chromatographic analysis technology.Finally,computer software technologies for predicting chromatographic conditions and analytical parameters are introduced,which provides an idea for intelligent method development in MCA.This paper aims to provide methodological ideas for the improvement of complex system analysis,especially MCQA.展开更多
A Taylor series expansion(TSE) based design for minimum mean-square error(MMSE) and QR decomposition(QRD) of multi-input and multi-output(MIMO) systems is proposed based on application specific instruction set process...A Taylor series expansion(TSE) based design for minimum mean-square error(MMSE) and QR decomposition(QRD) of multi-input and multi-output(MIMO) systems is proposed based on application specific instruction set processor(ASIP), which uses TSE algorithm instead of resource-consuming reciprocal and reciprocal square root(RSR) operations.The aim is to give a high performance implementation for MMSE and QRD in one programmable platform simultaneously.Furthermore, instruction set architecture(ISA) and the allocation of data paths in single instruction multiple data-very long instruction word(SIMD-VLIW) architecture are provided, offering more data parallelism and instruction parallelism for different dimension matrices and operation types.Meanwhile, multiple level numerical precision can be achieved with flexible table size and expansion order in TSE ISA.The ASIP has been implemented to a 28 nm CMOS process and frequency reaches 800 MHz.Experimental results show that the proposed design provides perfect numerical precision within the fixed bit-width of the ASIP, higher matrix processing rate better than the requirements of 5G system and more rate-area efficiency comparable with ASIC implementations.展开更多
CRISPR/Cas9-mediated genome editing is a powerful tool for life science research. Recently, strawberry (Fragaria × ananassa), an important horticultural crop, has emerged as a model organism for investigating the...CRISPR/Cas9-mediated genome editing is a powerful tool for life science research. Recently, strawberry (Fragaria × ananassa), an important horticultural crop, has emerged as a model organism for investigating the regulatory mechanisms of fruit development and ripening (Shulaev et al., 2011; Jia et al., 2013, 2017; Kang et al., 2013; Han et al., 2015). While most cultivated strawberries展开更多
Supplying the electronic equipment by exploiting ambient energy sources is a hot spot. In order to achieve the match between power supply and demands under the variance of environments at real time, a reconfigurable t...Supplying the electronic equipment by exploiting ambient energy sources is a hot spot. In order to achieve the match between power supply and demands under the variance of environments at real time, a reconfigurable technique is taken. In this paper, a dynamic power consumption model by using a lookup table as a unit is proposed. Then, we establish a system-level task scheduling model according to the task type. Based on single instruction multiple data (SIMD) architecture which contains a processing system and a control system with a Nios II processor, a practical dynamic reconfigurable system is built. The approach is evaluated on a hardware platform. The test results show that the system can automatically adjust the power consumption in case of external energy input changing. The utilization of the system dynamic power of their portion is from 80.05% to 91.75% during the first task assignment. During the entire processing cycle, the total energy efficiency is 97.67%.展开更多
Tree search is a widely used fundamental algorithm. Modern processors provide tremendous computing power by integrating multiple cores, each with a vector processing unit. This paper reviews some studies on exploiting...Tree search is a widely used fundamental algorithm. Modern processors provide tremendous computing power by integrating multiple cores, each with a vector processing unit. This paper reviews some studies on exploiting single instruction multiple date (SIMD) capacity of processors to improve the performance of tree search, and proposes several improvement methods on reported SIMD tree search algorithms. Based on blocking tree structure, blocking for memory alignment and dynamic blocking prefetch are proposed to optimize the overhead of memory access. Furthermore, as a way of non-linear loop unrolling, the search branch unwinding shows that the number of branches can exceed the data width of SIMD instructions in the SIMD search algorithm. The experiments suggest that blocking optimized SIMD tree search algorithm can achieve 1.6 times response speed faster than the un-optimized algorithm.展开更多
An analytical model for dynamic recrystallization (DRX) is studied based on the relative grain size model proposed by Sakai and Jonas, and the characteristic flow behaviors under DRX are analyzed and simulated. Int...An analytical model for dynamic recrystallization (DRX) is studied based on the relative grain size model proposed by Sakai and Jonas, and the characteristic flow behaviors under DRX are analyzed and simulated. Introducing the variation of dynamic grain size and the heterogeneous distribution of disolo- cation densities densities under DRX,a simple method for modeling and simulating DRX processes is developed by using Laplace transformation theory. The results derived from the present model agree well with the experimental results in literatures. This simulation can reproduce a number of features in DRX flow behaviors, for example,single and multiple peak flow behaviors followed by a steady state flow, the transition between them, and so on.展开更多
This paper is devoted to the study ofthe existence of single and multiple positive solutions for the first order boundary value problem x′= f(t, x), x(0) = x(T), where f ∈ C([0,T] × R) . In addition, we...This paper is devoted to the study ofthe existence of single and multiple positive solutions for the first order boundary value problem x′= f(t, x), x(0) = x(T), where f ∈ C([0,T] × R) . In addition, we apply our existence theorems to a class of nonlinear periodic boundary value problems with a singularity at the origin. Our proofs are based on a fixed point theorem in cones. Our results improve some recent results in the literatures.展开更多
Wave propagation is studied in structures consisting of alternate left- and right-handed layers. Bragg gap and zero-n gap appear in different frequency regions of the structure. The periodicity of the structure is bro...Wave propagation is studied in structures consisting of alternate left- and right-handed layers. Bragg gap and zero-n gap appear in different frequency regions of the structure. The periodicity of the structure is broken by simply reversing the order of the layers in one half of the structure, resulting in defect modes located inside the zero-n gap and Bragg gap. These modes can be made very narrow by adding more layers in the structure. The defect mode located inside the zero-n gap is sensitive to the symmetry of the structure and insensitive to the angle of incidence of the incoming radiation. Multiple modes are also generated inside the gaps by repeating the structural pattern. Thus, a simple structure can be used for single and multiple modes that are imDortant for different applications.展开更多
In this paper,a method is proposed to improve the energy efficiency of the vertical axis turbine.First of all,a single disk multiple stream-tube model is used to calculate individual fitness.Genetic algorithm is adopt...In this paper,a method is proposed to improve the energy efficiency of the vertical axis turbine.First of all,a single disk multiple stream-tube model is used to calculate individual fitness.Genetic algorithm is adopted to optimize blade pitch motion of vertical axis turbine with the maximum energy efficiency being selected as the optimization objective.Then,a particular data processing method is proposed,fitting the result data into a cosine-like curve.After that,a general formula calculating the blade motion is developed.Finally,CFD simulation is used to validate the blade pitch motion formula.The results show that the turbine's energy efficiency becomes higher after the optimization of blade pitch motion;compared with the fixed pitch turbine,the efficiency of variable-pitch turbine is significantly improved by the active blade pitch control;the energy efficiency declines gradually with the growth of speed ratio;besides,compactness has lager effect on the blade motion while the number of blades has little effect on it.展开更多
The feedback delay can severely affect the quality of the channel state information at the transmitter (CSIT) which is fed back from the receiver. The outdated CSIT will cause large performance loss in the transmit ...The feedback delay can severely affect the quality of the channel state information at the transmitter (CSIT) which is fed back from the receiver. The outdated CSIT will cause large performance loss in the transmit beamforming systems. The effect of variable feedback delay on the capacity of transmit beamforming systems over Rayleigh fading channels is studied. First, the case of fixed feedback delay is considered and a closed-form expression of system capacity is derived. Based on the results of fixed delay, the delay following certain distributions in variable delay case is assumed and the closed-form expressions of capacities are derived. The closed-form expressions show that the capacity is significantly affected by the statistical characteristics of the feedback delay. The obtained results provide an analytical insight into the effects caused by variable delay on the system capacity.展开更多
A new technique for the generation of multi-channel optical pulse from a single laser diode (LD) is presented in this paper. 35 channel pulse source with 6.5 GHz repetition rate per channel and 32.5 GHz channel spacin...A new technique for the generation of multi-channel optical pulse from a single laser diode (LD) is presented in this paper. 35 channel pulse source with 6.5 GHz repetition rate per channel and 32.5 GHz channel spacing was generated from a subharmonically hybrid mode-locked two section monolithic laser with enhanced amplitude modulation. The obtained pulse source exhibits high extinction ratio (>10 dB) and low level of root mean square (RMS) phase noise (<0.11 rad) over all channels from 1556 nm to 1565...展开更多
The growing demand for semiconductor devices simulation poses a big challenge for large-scale electronic structure calculations.Among various methods,the linearly scaling three-dimensional fragment(LS3DF)method exhibi...The growing demand for semiconductor devices simulation poses a big challenge for large-scale electronic structure calculations.Among various methods,the linearly scaling three-dimensional fragment(LS3DF)method exhibits excellent scalability in large-scale simulations.Based on algorithmic and system-level optimizations,we propose a highly scalable and highly efficient implementation of LS3DF on a domestic heterogeneous supercomputer equipped with acceler-ators.In terms of algorithmic optimizations,the original all-band conjugate gradient algorithm is refined to achieve faster convergence,and mixed precision computing is adopted to increase overall efficiency.In terms of system-level optimiza-tions,the original two-layer parallel structure is replaced by a coarse-grained parallel method.Optimization strategies such as multi-stream,kernel fusion,and redundant computation removal are proposed to increase further utilization of the com-putational power provided by the heterogeneous machines.As a result,our optimized LS3DF can scale to a 10-million sili-con atoms system,attaining a peak performance of 34.8 PFLOPS(21.2% of the peak).All the improvements can be adapt-ed to the next-generation supercomputers for larger simulations.展开更多
For studying and optimizing the performance of general-purpose computing on graphics processing units(GPGPU)based on single instruction multiple threads(SIMT)processor about the neural network application,this work co...For studying and optimizing the performance of general-purpose computing on graphics processing units(GPGPU)based on single instruction multiple threads(SIMT)processor about the neural network application,this work contributes a self-developed SIMT processor named Pomelo and correlated assembly program.The parallel mechanism of SIMT computing mode and self-developed Pomelo processor is briefly introduced.A common convolutional neural network(CNN)is built to verify the compatibility and functionality of the Pomelo processor.CNN computing flow with task level and hardware level optimization is adopted on the Pomelo processor.A specific algorithm for organizing a Z-shaped memory structure is developed,which addresses reducing memory access in mass data computing tasks.Performing the above-combined adaptation and optimization strategy,the experimental result demonstrates that reducing memory access in SIMT computing mode plays a crucial role in improving performance.A 6.52 times performance is achieved on the 4 processing elements case.展开更多
Objective To establish a quality control method for simultaneous determination of multiple components in gamboge. Methods A single reference standard for the determination of multiple components (SSDMC) with HPLC wa...Objective To establish a quality control method for simultaneous determination of multiple components in gamboge. Methods A single reference standard for the determination of multiple components (SSDMC) with HPLC was proposed. Seven major components of gamboge including gambogenic acid (S), β-morellic acid (C1), 2R-30-hydroxygambogic acid (C2), isogambogenic acid (C3), gambogellic acid (C4), 2R-gambogic acid (C5), and 2S-gambogic acid (C6) were simultaneously analyzed using gambogenic acid as reference standard. The credibility and feasibility of SSDMC method were validated with respect to linearity, limits of detection and quantification, precision, stability, repeatability, accuracy, ruggedness, and robustness. The relative conversion factors (RCFs) of S and C1-6 were calculated. Twelve batches of gamboge including crude and processed products were successfully analyzed by applying the SSDMC and traditional external standard (ES) methods. Results The SSDMC method was credible and feasible. The RCFs of S and C1-6 were 1.000, 0.913, 0.864, 1.064, 0.777, 0.921, and 0.919, respectively. No significant difference was observed in the contents of the seven components between SSDMC and ES methods. The heat-processing technique caused a reduction in the seven components. Conclusion SSDMC is a simple, reliable, and effective method for the analysis of the complex multiple components in gamboge, and it is also a practical and economical approach.展开更多
As an emerging joint learning model,federated learning is a promising way to combine model parameters of different users for training and inference without collecting users’original data.However,a practical and effic...As an emerging joint learning model,federated learning is a promising way to combine model parameters of different users for training and inference without collecting users’original data.However,a practical and efficient solution has not been established in previous work due to the absence of efficient matrix computation and cryptography schemes in the privacy-preserving federated learning model,especially in partially homomorphic cryptosystems.In this paper,we propose a Practical and Efficient Privacy-preserving Federated Learning(PEPFL)framework.First,we present a lifted distributed ElGamal cryptosystem for federated learning,which can solve the multi-key problem in federated learning.Secondly,we develop a Practical Partially Single Instruction Multiple Data(PSIMD)parallelism scheme that can encode a plaintext matrix into single plaintext for encryption,improving the encryption efficiency and reducing the communication cost in partially homomorphic cryptosystem.In addition,based on the Convolutional Neural Network(CNN)and the designed cryptosystem,a novel privacy-preserving federated learning framework is designed by using Momentum Gradient Descent(MGD).Finally,we evaluate the security and performance of PEPFL.The experiment results demonstrate that the scheme is practicable,effective,and secure with low communication and computation costs.展开更多
This paper studies the time-dependent analysis of an M/M/1 queueing model with single,multiple working vacation,balking and vacation interruptions.Whenever the system becomes empty,the server commences working vacatio...This paper studies the time-dependent analysis of an M/M/1 queueing model with single,multiple working vacation,balking and vacation interruptions.Whenever the system becomes empty,the server commences working vacation.During the working vacation period,if the queue length reaches a positive threshold value‘k’,the working vacation of the server is interrupted and it immediately starts the service in an exhaustive manner.During working vacations,the customers become discouraged due to the slow service and possess balking behavior.The transient system size probabilities of the proposed model are derived explicitly using the method of generating function and continued fraction.The performance indices such as average and variance of system size are also obtained.Further,numerical simulations are presented to analyze the impact of system parameters.展开更多
Computer vision(CV)algorithms have been extensively used for a myriad of applications nowadays.As the multimedia data are generally well-formatted and regular,it is beneficial to leverage the massive parallel processi...Computer vision(CV)algorithms have been extensively used for a myriad of applications nowadays.As the multimedia data are generally well-formatted and regular,it is beneficial to leverage the massive parallel processing power of the underlying platform to improve the performances of CV algorithms.Single Instruction Multiple Data(SIMD)instructions,capable of conducting the same operation on multiple data items in a single instruction,are extensively employed to improve the efficiency of CV algorithms.In this paper,we evaluate the power and effectiveness of RISC-V vector extension(RV-V)on typical CV algorithms,such as Gray Scale,Mean Filter,and Edge Detection.By our examinations,we show that compared with the baseline OpenCV implementation using scalar instructions,the equivalent implementations using the RV-V(version 0.8)can reduce the instruction count of the same CV algorithm up to 24x,when processing the same input images.Whereas,the actual performances improvement measured by the cycle counts is highly related with the specific implementation of the underlying RV-V co-processor.In our evaluation,by using the vector co-processor(with eight execution lanes)of Xuantie C906,vector-version CV algorithms averagely exhibit up to 2.98x performances speedups compared with their scalar counterparts.展开更多
A tremendous amount of data has been generated by global financial markets everyday,and such time-series data needs to be analyzed in real time to explore its potential value.In recent years,we have witnessed the succ...A tremendous amount of data has been generated by global financial markets everyday,and such time-series data needs to be analyzed in real time to explore its potential value.In recent years,we have witnessed the successful adoption of machine learning models on financial data,where the importance of accuracy and timeliness demands highly effective computing frameworks.However,traditional financial time-series data processing frameworks have shown performance degradation and adaptation issues,such as the outlier handling with stock suspension in Pandas and TA-Lib.In this paper,we propose HXPY,a high-performance data processing package with a C++/Python interface for financial time-series data.HXPY supports miscellaneous acceleration techniques such as the streaming algorithm,the vectorization instruction set,and memory optimization,together with various functions such as time window functions,group operations,down-sampling operations,cross-section operations,row-wise or column-wise operations,shape transformations,and alignment functions.The results of benchmark and incremental analysis demonstrate the superior performance of HXPY compared with its counterparts.From MiBs to GiBs data,HXPY significantly outperforms other in-memory dataframe computing rivals even up to hundreds of times.展开更多
文摘To effectively solve the problems of inconsistent communication protocols in automatic monitoring equipment,and limited data acquisition transmission and monitoring equipment,this paper developed programmable single point multiple output intelligent data acquisition and transmission system. It made an in-depth elaboration of the data acquisition and transmission system from hardware design,software architecture and principle,main functions and technical parameters. Finally,it came up with four innovation points:(i) intelligent(automatic)matching a variety of communication protocols for environmental monitoring equipment,(ii) realizing multi-protocol and multi-target parallel data transmission,(iii) realizing remote dynamic input of control instructions through wired or wireless network,and(iv) supporting configuration(process) simulation of field equipment DCS operating conditions.
基金This paper was supported by the Mexican Consejo Nacional de Ciencia y Tecnologia(CONACyT)for the postgraduate studies at University of Essex.
文摘In this work, we explore and study the implication of having more than one output on a genetic programming (GP) graph-representation. This approach, called multiple interactive outputs in a single tree (MIOST), is based on two ideas. First, we defined an approach, called interactivity within an individual (IWI), which is based on a graph-GP representation. Second, we add to the individuals created with the IWI approach multiple outputs in their structures and as a result of this, we have MIOST. As a first step, we analyze the effects of IWI by using only mutations and analyze its implications (i.e., presence of neutrality). Then, we continue testing the effectiveness of IWI by allowing mutations and the standard GP crossover in the evolutionary process. Finally, we tested the effectiveness of MIOST by using mutations and crossover and conducted extensive empirical results on different evolvable problems of different complexity taken from the literature. The results reported in this paper indicate that the proposed approach has a better overall performance in terms of consistency reaching feasible solutions.
基金the National Natural Science Foundation of China(Grant No.:81803734)National S&T Major Special Project for New Innovative Drugs Sponsored(Grant No.:2019ZX09201005).
文摘Complex systems exist widely,including medicines from natural products,functional foods,and biological samples.The biological activity of complex systems is often the result of the synergistic effect of multiple components.In the quality evaluation of complex samples,multicomponent quantitative analysis(MCQA)is usually needed.To overcome the difficulty in obtaining standard products,scholars have proposed achieving MCQA through the“single standard to determine multiple components(SSDMC)”approach.This method has been used in the determination of multicomponent content in natural source drugs and the analysis of impurities in chemical drugs and has been included in the Chinese Pharmacopoeia.Depending on a convenient(ultra)high-performance liquid chromatography method,how can the repeatability and robustness of the MCQA method be improved?How can the chromatography conditions be optimized to improve the number of quantitative components?How can computer software technology be introduced to improve the efficiency of multicomponent analysis(MCA)?These are the key problems that remain to be solved in practical MCQA.First,this review article summarizes the calculation methods of relative correction factors in the SSDMC approach in the past five years,as well as the method robustness and accuracy evaluation.Second,it also summarizes methods to improve peak capacity and quantitative accuracy in MCA,including column selection and twodimensional chromatographic analysis technology.Finally,computer software technologies for predicting chromatographic conditions and analytical parameters are introduced,which provides an idea for intelligent method development in MCA.This paper aims to provide methodological ideas for the improvement of complex system analysis,especially MCQA.
基金Supported by the Industrial Internet Innovation and Development Project of Ministry of Industry and Information Technology (No.GHBJ2004)。
文摘A Taylor series expansion(TSE) based design for minimum mean-square error(MMSE) and QR decomposition(QRD) of multi-input and multi-output(MIMO) systems is proposed based on application specific instruction set processor(ASIP), which uses TSE algorithm instead of resource-consuming reciprocal and reciprocal square root(RSR) operations.The aim is to give a high performance implementation for MMSE and QRD in one programmable platform simultaneously.Furthermore, instruction set architecture(ISA) and the allocation of data paths in single instruction multiple data-very long instruction word(SIMD-VLIW) architecture are provided, offering more data parallelism and instruction parallelism for different dimension matrices and operation types.Meanwhile, multiple level numerical precision can be achieved with flexible table size and expansion order in TSE ISA.The ASIP has been implemented to a 28 nm CMOS process and frequency reaches 800 MHz.Experimental results show that the proposed design provides perfect numerical precision within the fixed bit-width of the ASIP, higher matrix processing rate better than the requirements of 5G system and more rate-area efficiency comparable with ASIC implementations.
基金supported by the National Natural Science Foundation of China (Nos. 31572104, 31772284, 31471851 and 31672133)the Fok Ying-Tong Education Foundation of China (No. 151027)the Beijing Key Laboratory of New Technology in Agricultural Application (kf2016023)
文摘CRISPR/Cas9-mediated genome editing is a powerful tool for life science research. Recently, strawberry (Fragaria × ananassa), an important horticultural crop, has emerged as a model organism for investigating the regulatory mechanisms of fruit development and ripening (Shulaev et al., 2011; Jia et al., 2013, 2017; Kang et al., 2013; Han et al., 2015). While most cultivated strawberries
基金supported by the National Natural Science Foundation of China under Grant No. 61176025 and No. 61006027the Fundamental Research Funds for the Central Universities under Grant No.ZYGX2012J003+1 种基金National Laboratory of Analogue Integrated Circuit Grants under Grants No. 9140C0901101002 and No. 9140C0901101003New Century Excellent Talents Program under Grant No.NCET-10-0297
文摘Supplying the electronic equipment by exploiting ambient energy sources is a hot spot. In order to achieve the match between power supply and demands under the variance of environments at real time, a reconfigurable technique is taken. In this paper, a dynamic power consumption model by using a lookup table as a unit is proposed. Then, we establish a system-level task scheduling model according to the task type. Based on single instruction multiple data (SIMD) architecture which contains a processing system and a control system with a Nios II processor, a practical dynamic reconfigurable system is built. The approach is evaluated on a hardware platform. The test results show that the system can automatically adjust the power consumption in case of external energy input changing. The utilization of the system dynamic power of their portion is from 80.05% to 91.75% during the first task assignment. During the entire processing cycle, the total energy efficiency is 97.67%.
基金Project supported by the Shanghai Leading Academic Discipline Project(Grant No.J50103)the Graduate Student Innovation Foundation of Shanghai University(Grant No.SHUCX112167)
文摘Tree search is a widely used fundamental algorithm. Modern processors provide tremendous computing power by integrating multiple cores, each with a vector processing unit. This paper reviews some studies on exploiting single instruction multiple date (SIMD) capacity of processors to improve the performance of tree search, and proposes several improvement methods on reported SIMD tree search algorithms. Based on blocking tree structure, blocking for memory alignment and dynamic blocking prefetch are proposed to optimize the overhead of memory access. Furthermore, as a way of non-linear loop unrolling, the search branch unwinding shows that the number of branches can exceed the data width of SIMD instructions in the SIMD search algorithm. The experiments suggest that blocking optimized SIMD tree search algorithm can achieve 1.6 times response speed faster than the un-optimized algorithm.
文摘An analytical model for dynamic recrystallization (DRX) is studied based on the relative grain size model proposed by Sakai and Jonas, and the characteristic flow behaviors under DRX are analyzed and simulated. Introducing the variation of dynamic grain size and the heterogeneous distribution of disolo- cation densities densities under DRX,a simple method for modeling and simulating DRX processes is developed by using Laplace transformation theory. The results derived from the present model agree well with the experimental results in literatures. This simulation can reproduce a number of features in DRX flow behaviors, for example,single and multiple peak flow behaviors followed by a steady state flow, the transition between them, and so on.
基金Science Foundation for Young Teachers of Northeast Normal University(No:20060108)the National Natural Science Foundation of China(No.10571021)Key Laboratory for Applied Statistics of MOE(KLAS)
文摘This paper is devoted to the study ofthe existence of single and multiple positive solutions for the first order boundary value problem x′= f(t, x), x(0) = x(T), where f ∈ C([0,T] × R) . In addition, we apply our existence theorems to a class of nonlinear periodic boundary value problems with a singularity at the origin. Our proofs are based on a fixed point theorem in cones. Our results improve some recent results in the literatures.
文摘Wave propagation is studied in structures consisting of alternate left- and right-handed layers. Bragg gap and zero-n gap appear in different frequency regions of the structure. The periodicity of the structure is broken by simply reversing the order of the layers in one half of the structure, resulting in defect modes located inside the zero-n gap and Bragg gap. These modes can be made very narrow by adding more layers in the structure. The defect mode located inside the zero-n gap is sensitive to the symmetry of the structure and insensitive to the angle of incidence of the incoming radiation. Multiple modes are also generated inside the gaps by repeating the structural pattern. Thus, a simple structure can be used for single and multiple modes that are imDortant for different applications.
基金financially supported by the National Natural Science Foundation of China(Grant No.51309069)the Special Funded of Innovational Talents of Science and Technology in Harbin(Grant No.RC2014QN001008)+1 种基金the China Postdoctoral Science Foundation(Grant No.2014M561334)the Heilongjiang Postdoctoral Science Foundation(Grant No.LBH-Z14060)
文摘In this paper,a method is proposed to improve the energy efficiency of the vertical axis turbine.First of all,a single disk multiple stream-tube model is used to calculate individual fitness.Genetic algorithm is adopted to optimize blade pitch motion of vertical axis turbine with the maximum energy efficiency being selected as the optimization objective.Then,a particular data processing method is proposed,fitting the result data into a cosine-like curve.After that,a general formula calculating the blade motion is developed.Finally,CFD simulation is used to validate the blade pitch motion formula.The results show that the turbine's energy efficiency becomes higher after the optimization of blade pitch motion;compared with the fixed pitch turbine,the efficiency of variable-pitch turbine is significantly improved by the active blade pitch control;the energy efficiency declines gradually with the growth of speed ratio;besides,compactness has lager effect on the blade motion while the number of blades has little effect on it.
基金supported by the Natural Science Foundation of Shanghai (09ZR1430500)the Chinese National Science and Technology Major Project (2011ZX03003-001-01 2009ZX03002-003-004)
文摘The feedback delay can severely affect the quality of the channel state information at the transmitter (CSIT) which is fed back from the receiver. The outdated CSIT will cause large performance loss in the transmit beamforming systems. The effect of variable feedback delay on the capacity of transmit beamforming systems over Rayleigh fading channels is studied. First, the case of fixed feedback delay is considered and a closed-form expression of system capacity is derived. Based on the results of fixed delay, the delay following certain distributions in variable delay case is assumed and the closed-form expressions of capacities are derived. The closed-form expressions show that the capacity is significantly affected by the statistical characteristics of the feedback delay. The obtained results provide an analytical insight into the effects caused by variable delay on the system capacity.
文摘A new technique for the generation of multi-channel optical pulse from a single laser diode (LD) is presented in this paper. 35 channel pulse source with 6.5 GHz repetition rate per channel and 32.5 GHz channel spacing was generated from a subharmonically hybrid mode-locked two section monolithic laser with enhanced amplitude modulation. The obtained pulse source exhibits high extinction ratio (>10 dB) and low level of root mean square (RMS) phase noise (<0.11 rad) over all channels from 1556 nm to 1565...
基金This work was supported by the National Key Research and Development Program of China under Grant No.2021YFB0300600the National Natural Science Foundation of China under Grant Nos.92270206,T2125013,62032023,61972377,T2293702,and 12274360+2 种基金the Chinese Academy of Sciences Project for Young Scientists in Basic Research under Grant No.YSBR-005the Network Information Project of Chinese Academy of Sciences under Grant No.CASWX2021SF-0103the Key Research Program of Chinese Academy of Sciences under Grant No.ZDBSSSW-WHC002.
文摘The growing demand for semiconductor devices simulation poses a big challenge for large-scale electronic structure calculations.Among various methods,the linearly scaling three-dimensional fragment(LS3DF)method exhibits excellent scalability in large-scale simulations.Based on algorithmic and system-level optimizations,we propose a highly scalable and highly efficient implementation of LS3DF on a domestic heterogeneous supercomputer equipped with acceler-ators.In terms of algorithmic optimizations,the original all-band conjugate gradient algorithm is refined to achieve faster convergence,and mixed precision computing is adopted to increase overall efficiency.In terms of system-level optimiza-tions,the original two-layer parallel structure is replaced by a coarse-grained parallel method.Optimization strategies such as multi-stream,kernel fusion,and redundant computation removal are proposed to increase further utilization of the com-putational power provided by the heterogeneous machines.As a result,our optimized LS3DF can scale to a 10-million sili-con atoms system,attaining a peak performance of 34.8 PFLOPS(21.2% of the peak).All the improvements can be adapt-ed to the next-generation supercomputers for larger simulations.
基金the Scientific Research Program Funded by Shaanxi Provincial Education Department(20JY058)。
文摘For studying and optimizing the performance of general-purpose computing on graphics processing units(GPGPU)based on single instruction multiple threads(SIMT)processor about the neural network application,this work contributes a self-developed SIMT processor named Pomelo and correlated assembly program.The parallel mechanism of SIMT computing mode and self-developed Pomelo processor is briefly introduced.A common convolutional neural network(CNN)is built to verify the compatibility and functionality of the Pomelo processor.CNN computing flow with task level and hardware level optimization is adopted on the Pomelo processor.A specific algorithm for organizing a Z-shaped memory structure is developed,which addresses reducing memory access in mass data computing tasks.Performing the above-combined adaptation and optimization strategy,the experimental result demonstrates that reducing memory access in SIMT computing mode plays a crucial role in improving performance.A 6.52 times performance is achieved on the 4 processing elements case.
基金Science and Technology Commission of Shanghai Municipality(13ZR1442000)Shanghai Municipal Education Commission(2014YSN20)Support Program
文摘Objective To establish a quality control method for simultaneous determination of multiple components in gamboge. Methods A single reference standard for the determination of multiple components (SSDMC) with HPLC was proposed. Seven major components of gamboge including gambogenic acid (S), β-morellic acid (C1), 2R-30-hydroxygambogic acid (C2), isogambogenic acid (C3), gambogellic acid (C4), 2R-gambogic acid (C5), and 2S-gambogic acid (C6) were simultaneously analyzed using gambogenic acid as reference standard. The credibility and feasibility of SSDMC method were validated with respect to linearity, limits of detection and quantification, precision, stability, repeatability, accuracy, ruggedness, and robustness. The relative conversion factors (RCFs) of S and C1-6 were calculated. Twelve batches of gamboge including crude and processed products were successfully analyzed by applying the SSDMC and traditional external standard (ES) methods. Results The SSDMC method was credible and feasible. The RCFs of S and C1-6 were 1.000, 0.913, 0.864, 1.064, 0.777, 0.921, and 0.919, respectively. No significant difference was observed in the contents of the seven components between SSDMC and ES methods. The heat-processing technique caused a reduction in the seven components. Conclusion SSDMC is a simple, reliable, and effective method for the analysis of the complex multiple components in gamboge, and it is also a practical and economical approach.
基金supported by the National Natural Science Foundation of China under Grant No.U19B2021the Key Research and Development Program of Shaanxi under Grant No.2020ZDLGY08-04+1 种基金the Key Technologies R&D Program of He’nan Province under Grant No.212102210084the Innovation Scientists and Technicians Troop Construction Projects of Henan Province.
文摘As an emerging joint learning model,federated learning is a promising way to combine model parameters of different users for training and inference without collecting users’original data.However,a practical and efficient solution has not been established in previous work due to the absence of efficient matrix computation and cryptography schemes in the privacy-preserving federated learning model,especially in partially homomorphic cryptosystems.In this paper,we propose a Practical and Efficient Privacy-preserving Federated Learning(PEPFL)framework.First,we present a lifted distributed ElGamal cryptosystem for federated learning,which can solve the multi-key problem in federated learning.Secondly,we develop a Practical Partially Single Instruction Multiple Data(PSIMD)parallelism scheme that can encode a plaintext matrix into single plaintext for encryption,improving the encryption efficiency and reducing the communication cost in partially homomorphic cryptosystem.In addition,based on the Convolutional Neural Network(CNN)and the designed cryptosystem,a novel privacy-preserving federated learning framework is designed by using Momentum Gradient Descent(MGD).Finally,we evaluate the security and performance of PEPFL.The experiment results demonstrate that the scheme is practicable,effective,and secure with low communication and computation costs.
文摘This paper studies the time-dependent analysis of an M/M/1 queueing model with single,multiple working vacation,balking and vacation interruptions.Whenever the system becomes empty,the server commences working vacation.During the working vacation period,if the queue length reaches a positive threshold value‘k’,the working vacation of the server is interrupted and it immediately starts the service in an exhaustive manner.During working vacations,the customers become discouraged due to the slow service and possess balking behavior.The transient system size probabilities of the proposed model are derived explicitly using the method of generating function and continued fraction.The performance indices such as average and variance of system size are also obtained.Further,numerical simulations are presented to analyze the impact of system parameters.
基金supported by the National Natural Science Foundation of China under Grant No.61972444。
文摘Computer vision(CV)algorithms have been extensively used for a myriad of applications nowadays.As the multimedia data are generally well-formatted and regular,it is beneficial to leverage the massive parallel processing power of the underlying platform to improve the performances of CV algorithms.Single Instruction Multiple Data(SIMD)instructions,capable of conducting the same operation on multiple data items in a single instruction,are extensively employed to improve the efficiency of CV algorithms.In this paper,we evaluate the power and effectiveness of RISC-V vector extension(RV-V)on typical CV algorithms,such as Gray Scale,Mean Filter,and Edge Detection.By our examinations,we show that compared with the baseline OpenCV implementation using scalar instructions,the equivalent implementations using the RV-V(version 0.8)can reduce the instruction count of the same CV algorithm up to 24x,when processing the same input images.Whereas,the actual performances improvement measured by the cycle counts is highly related with the specific implementation of the underlying RV-V co-processor.In our evaluation,by using the vector co-processor(with eight execution lanes)of Xuantie C906,vector-version CV algorithms averagely exhibit up to 2.98x performances speedups compared with their scalar counterparts.
文摘A tremendous amount of data has been generated by global financial markets everyday,and such time-series data needs to be analyzed in real time to explore its potential value.In recent years,we have witnessed the successful adoption of machine learning models on financial data,where the importance of accuracy and timeliness demands highly effective computing frameworks.However,traditional financial time-series data processing frameworks have shown performance degradation and adaptation issues,such as the outlier handling with stock suspension in Pandas and TA-Lib.In this paper,we propose HXPY,a high-performance data processing package with a C++/Python interface for financial time-series data.HXPY supports miscellaneous acceleration techniques such as the streaming algorithm,the vectorization instruction set,and memory optimization,together with various functions such as time window functions,group operations,down-sampling operations,cross-section operations,row-wise or column-wise operations,shape transformations,and alignment functions.The results of benchmark and incremental analysis demonstrate the superior performance of HXPY compared with its counterparts.From MiBs to GiBs data,HXPY significantly outperforms other in-memory dataframe computing rivals even up to hundreds of times.