The flexibility of traditional image processing system is limited because those system are designed for specific applications. In this paper, a new TMS320C64x-based multi-DSP parallel computing architecture is present...The flexibility of traditional image processing system is limited because those system are designed for specific applications. In this paper, a new TMS320C64x-based multi-DSP parallel computing architecture is presented. It has many promising characteristics such as powerful computing capability, broad I/O bandwidth, topology flexibility, and expansibility. The parallel system performance is evaluated by practical experiment.展开更多
This peper defines the communication-efficiency, which is directly related to the cost-efficiency, and Studies the relationship between the communication-efficiency and the processor-efficiency when they are applied t...This peper defines the communication-efficiency, which is directly related to the cost-efficiency, and Studies the relationship between the communication-efficiency and the processor-efficiency when they are applied to scalability analysis. An example of algorithms is given to analyze some typical architectures.展开更多
Personal desktop platform with teraflops peak performance of thousands of cores is realized at the price of conventional workstations using the programmable graphics processing units(GPUs).A GPU-based parallel Euler/N...Personal desktop platform with teraflops peak performance of thousands of cores is realized at the price of conventional workstations using the programmable graphics processing units(GPUs).A GPU-based parallel Euler/Navier-Stokes solver is developed for 2-D compressible flows by using NVIDIA′s Compute Unified Device Architecture(CUDA)programming model in CUDA Fortran programming language.The techniques of implementation of CUDA kernels,double-layered thread hierarchy and variety memory hierarchy are presented to form the GPU-based algorithm of Euler/Navier-Stokes equations.The resulting parallel solver is validated by a set of typical test flow cases.The numerical results show that dozens of times speedup relative to a serial CPU implementation can be achieved using a single GPU desktop platform,which demonstrates that a GPU desktop can serve as a costeffective parallel computing platform to accelerate computational fluid dynamics(CFD)simulations substantially.展开更多
An optimal algorithmic approach to task scheduling for, triplet based architecture(TriBA), is proposed in this paper. TriBA is considered to be a high performance, distributed parallel computing architecture. TriBA ...An optimal algorithmic approach to task scheduling for, triplet based architecture(TriBA), is proposed in this paper. TriBA is considered to be a high performance, distributed parallel computing architecture. TriBA consists of a 2D grid of small, programmable processing units, each physically connected to its three neighbors. In parallel or distributed environment an efficient assignment of tasks to the processing elements is imperative to achieve fast job turnaround time. Moreover, the sojourn time experienced by each individual job should be minimized. The arriving jobs are comprised of parallel applications, each consisting of multiple-independent tasks that must be instantaneously assigned to processor queues, as they arrive. The processors independently and concurrently service these tasks. The key scheduling issues is, when some queue backlogs are small, an incoming job should first spread its tasks to those lightly loaded queues in order to take advantage of the parallel processing gain. Our algorithmic approach achieves optimality in task scheduling by assigning consecutive tasks to a triplet of processors exploiting locality in tasks. The experimental results show that tasks allocation to triplets of processing elements is efficient and optimal. Comparison to well accepted interconnection strategy, 2D mesh, is shown to prove the effectiveness of our algorithmic approach for TriBA. Finally we conclude that TriBA can be an efficient interconnection strategy for computations intensive applications, if tasks assignment is carried out optimally using algorithmic approach.展开更多
Evolutionary neural network(ENN)shows high performance in function optimization and in finding approximately global optima from searching large and complex spaces.It is one of the most efficient and adaptive optimizat...Evolutionary neural network(ENN)shows high performance in function optimization and in finding approximately global optima from searching large and complex spaces.It is one of the most efficient and adaptive optimization techniques used widely to provide candidate solutions that lead to the fitness of the problem.ENN has the extraordinary ability to search the global and learning the approximate optimal solution regardless of the gradient information of the error functions.However,ENN requires high computation and processing which requires parallel processing platforms such as field programmable gate arrays(FPGAs)and graphic processing units(GPUs)to achieve a good performance.This work involves different new implementations of ENN by exploring and adopting different techniques and opportunities for parallel processing.Different versions of ENN algorithm have also been implemented and parallelized on FPGAs platform for low latency by exploiting the parallelism and pipelining approaches.Real data form mass spectrometry data(MSD)application was tested to examine and verify our implementations.This is a very important and extensive computation application which needs to search and find the optimal features(peaks)in MSD in order to distinguish cancer patients from control patients.ENN algorithm is also implemented and parallelized on single core and GPU platforms for comparison purposes.The computation time of our optimized algorithm on FPGA and GPU has been improved by a factor of 6.75 and 6,respectively.展开更多
基金This project was supported by the National Natural Science Foundation of China (60135020).
文摘The flexibility of traditional image processing system is limited because those system are designed for specific applications. In this paper, a new TMS320C64x-based multi-DSP parallel computing architecture is presented. It has many promising characteristics such as powerful computing capability, broad I/O bandwidth, topology flexibility, and expansibility. The parallel system performance is evaluated by practical experiment.
文摘This peper defines the communication-efficiency, which is directly related to the cost-efficiency, and Studies the relationship between the communication-efficiency and the processor-efficiency when they are applied to scalability analysis. An example of algorithms is given to analyze some typical architectures.
基金supported by the National Natural Science Foundation of China (No.11172134)the Funding of Jiangsu Innovation Program for Graduate Education (No.CXLX13_132)
文摘Personal desktop platform with teraflops peak performance of thousands of cores is realized at the price of conventional workstations using the programmable graphics processing units(GPUs).A GPU-based parallel Euler/Navier-Stokes solver is developed for 2-D compressible flows by using NVIDIA′s Compute Unified Device Architecture(CUDA)programming model in CUDA Fortran programming language.The techniques of implementation of CUDA kernels,double-layered thread hierarchy and variety memory hierarchy are presented to form the GPU-based algorithm of Euler/Navier-Stokes equations.The resulting parallel solver is validated by a set of typical test flow cases.The numerical results show that dozens of times speedup relative to a serial CPU implementation can be achieved using a single GPU desktop platform,which demonstrates that a GPU desktop can serve as a costeffective parallel computing platform to accelerate computational fluid dynamics(CFD)simulations substantially.
文摘An optimal algorithmic approach to task scheduling for, triplet based architecture(TriBA), is proposed in this paper. TriBA is considered to be a high performance, distributed parallel computing architecture. TriBA consists of a 2D grid of small, programmable processing units, each physically connected to its three neighbors. In parallel or distributed environment an efficient assignment of tasks to the processing elements is imperative to achieve fast job turnaround time. Moreover, the sojourn time experienced by each individual job should be minimized. The arriving jobs are comprised of parallel applications, each consisting of multiple-independent tasks that must be instantaneously assigned to processor queues, as they arrive. The processors independently and concurrently service these tasks. The key scheduling issues is, when some queue backlogs are small, an incoming job should first spread its tasks to those lightly loaded queues in order to take advantage of the parallel processing gain. Our algorithmic approach achieves optimality in task scheduling by assigning consecutive tasks to a triplet of processors exploiting locality in tasks. The experimental results show that tasks allocation to triplets of processing elements is efficient and optimal. Comparison to well accepted interconnection strategy, 2D mesh, is shown to prove the effectiveness of our algorithmic approach for TriBA. Finally we conclude that TriBA can be an efficient interconnection strategy for computations intensive applications, if tasks assignment is carried out optimally using algorithmic approach.
文摘Evolutionary neural network(ENN)shows high performance in function optimization and in finding approximately global optima from searching large and complex spaces.It is one of the most efficient and adaptive optimization techniques used widely to provide candidate solutions that lead to the fitness of the problem.ENN has the extraordinary ability to search the global and learning the approximate optimal solution regardless of the gradient information of the error functions.However,ENN requires high computation and processing which requires parallel processing platforms such as field programmable gate arrays(FPGAs)and graphic processing units(GPUs)to achieve a good performance.This work involves different new implementations of ENN by exploring and adopting different techniques and opportunities for parallel processing.Different versions of ENN algorithm have also been implemented and parallelized on FPGAs platform for low latency by exploiting the parallelism and pipelining approaches.Real data form mass spectrometry data(MSD)application was tested to examine and verify our implementations.This is a very important and extensive computation application which needs to search and find the optimal features(peaks)in MSD in order to distinguish cancer patients from control patients.ENN algorithm is also implemented and parallelized on single core and GPU platforms for comparison purposes.The computation time of our optimized algorithm on FPGA and GPU has been improved by a factor of 6.75 and 6,respectively.