多核结构上采用由用户显式制导的并行程序设计模型,使用锁和同步变量来实现同步。事务存储模型能够解决由锁机制带来的一系列问题,提高程序的并发性。介绍了在文中提出的一种基于事务存储模型的多核结构(Transactional-Memory based Chi...多核结构上采用由用户显式制导的并行程序设计模型,使用锁和同步变量来实现同步。事务存储模型能够解决由锁机制带来的一系列问题,提高程序的并发性。介绍了在文中提出的一种基于事务存储模型的多核结构(Transactional-Memory based Chip Multiple-Superscaler,TMCMS)上的并行编程模型,以及针对循环程序的执行模型;以FFT程序为例具体介绍了循环结构的并行化方法和编译转换过程。在初步的实验中,将处理单元从1增加到16个时,在所设计的编程模型的支持下,IPC(Instruction PerCycle)有接近线性的增长,说明该并行编程模型能够充分发掘程序中潜在的细粒度线程级并行性,同时保持并行程序设计的简单性。展开更多
The high-speed computational performance is gained at the cost of huge hardware resource,which restricts the application of high-accuracy algorithms because of the limited hardware cost in practical use.To solve the p...The high-speed computational performance is gained at the cost of huge hardware resource,which restricts the application of high-accuracy algorithms because of the limited hardware cost in practical use.To solve the problem,a novel method for designing the field programmable gate array(FPGA)-based non-uniform rational B-spline(NURBS) interpolator and motion controller,which adopts the embedded multiprocessor technique,is proposed in this study.The hardware and software design for the multiprocessor,one of which is for NURBS interpolation and the other for position servo control,is presented.Performance analysis and experiments on an X-Y table are carried out,hardware cost as well as consuming time for interpolation and motion control is compared with the existing methods.The experimental and comparing results indicate that,compared with the existing methods,the proposed method can reduce the hardware cost by 97.5% using higher-accuracy interpolation algorithm within the period of 0.5 ms.A method which ensures the real-time performance and interpolation accuracy,and reduces the hardware cost significantly is proposed,and it’s practical in the use of industrial application.展开更多
Helper-thread of a task can hide the memory access time of irregular data on the chip muhi-core processor (CMP). For constructing a compiler that effectively supports the helper-thread of a task in the multi-core sc...Helper-thread of a task can hide the memory access time of irregular data on the chip muhi-core processor (CMP). For constructing a compiler that effectively supports the helper-thread of a task in the multi-core scenario based on the last level shared cache, this paper studies its performance stable condi- tions. Unfortunately, there is no existing model that allows extensive investigation of the impact of stable conditions, we present the base of pre-computation that is formalized by our degraded task-pair 〈 T, T' 〉 with the helper-thread, and its stable conditions are analyzed. Finally, a novel performance model and a constructing method of pre-computation based on our positive degraded task-pair are proposed. The efficient results are shown by our experiments. If we further exploit memory level parallelism (MLP) for our task-pair, the task-pair 〈 T, T' 〉 can reach better performance.展开更多
Single-chip multiprocessor (CMP) combined with the fault-loleranl(FT)techniques offers an ideal architecture to achieve high availability on the basis of sustaining highcomputing performance FT design of a single-chip...Single-chip multiprocessor (CMP) combined with the fault-loleranl(FT)techniques offers an ideal architecture to achieve high availability on the basis of sustaining highcomputing performance FT design of a single-chip multiprocessor is described, including thetechniques from hard-wart redundancy to software support and firmware strategy. The design aims atmasking the influences of errors and automatically correcting the system states.展开更多
文摘多核结构上采用由用户显式制导的并行程序设计模型,使用锁和同步变量来实现同步。事务存储模型能够解决由锁机制带来的一系列问题,提高程序的并发性。介绍了在文中提出的一种基于事务存储模型的多核结构(Transactional-Memory based Chip Multiple-Superscaler,TMCMS)上的并行编程模型,以及针对循环程序的执行模型;以FFT程序为例具体介绍了循环结构的并行化方法和编译转换过程。在初步的实验中,将处理单元从1增加到16个时,在所设计的编程模型的支持下,IPC(Instruction PerCycle)有接近线性的增长,说明该并行编程模型能够充分发掘程序中潜在的细粒度线程级并行性,同时保持并行程序设计的简单性。
基金supported by National Key Basic Research Program of China(973 ProgramGrant No.2011CB706804)+1 种基金Shanghai Municipal Science and Technology Commission of China(Grant No.11QH1401400)Research Project of State Key Laboratory of Mechanical System & Vibration of China(Grant No.MSVMS201102)
文摘The high-speed computational performance is gained at the cost of huge hardware resource,which restricts the application of high-accuracy algorithms because of the limited hardware cost in practical use.To solve the problem,a novel method for designing the field programmable gate array(FPGA)-based non-uniform rational B-spline(NURBS) interpolator and motion controller,which adopts the embedded multiprocessor technique,is proposed in this study.The hardware and software design for the multiprocessor,one of which is for NURBS interpolation and the other for position servo control,is presented.Performance analysis and experiments on an X-Y table are carried out,hardware cost as well as consuming time for interpolation and motion control is compared with the existing methods.The experimental and comparing results indicate that,compared with the existing methods,the proposed method can reduce the hardware cost by 97.5% using higher-accuracy interpolation algorithm within the period of 0.5 ms.A method which ensures the real-time performance and interpolation accuracy,and reduces the hardware cost significantly is proposed,and it’s practical in the use of industrial application.
文摘Helper-thread of a task can hide the memory access time of irregular data on the chip muhi-core processor (CMP). For constructing a compiler that effectively supports the helper-thread of a task in the multi-core scenario based on the last level shared cache, this paper studies its performance stable condi- tions. Unfortunately, there is no existing model that allows extensive investigation of the impact of stable conditions, we present the base of pre-computation that is formalized by our degraded task-pair 〈 T, T' 〉 with the helper-thread, and its stable conditions are analyzed. Finally, a novel performance model and a constructing method of pre-computation based on our positive degraded task-pair are proposed. The efficient results are shown by our experiments. If we further exploit memory level parallelism (MLP) for our task-pair, the task-pair 〈 T, T' 〉 can reach better performance.
基金Supported by the National High Techology Devel opment 863 Program of China(2002AA1Z030) and China PostdoctoralScience Foundation(2003034151)
文摘Single-chip multiprocessor (CMP) combined with the fault-loleranl(FT)techniques offers an ideal architecture to achieve high availability on the basis of sustaining highcomputing performance FT design of a single-chip multiprocessor is described, including thetechniques from hard-wart redundancy to software support and firmware strategy. The design aims atmasking the influences of errors and automatically correcting the system states.