The wide acceptance and data deluge in medical imaging processing require faster and more efficient systems to be built.Due to the advances in heterogeneous architectures recently,there has been a resurgence in the fi...The wide acceptance and data deluge in medical imaging processing require faster and more efficient systems to be built.Due to the advances in heterogeneous architectures recently,there has been a resurgence in the first research aimed at FPGA-based as well as GPGPU-based accelerator design.This paper quantitatively analyzes the workload,computational intensity and memory performance of a single-particle 3D reconstruction application,called EMAN,and parallelizes it on CUDA GPGPU architectures and decouples the memory operations from the computing flow and orchestrates the thread-data mapping to reduce the overhead of off-chip memory operations.Then it exploits the trend towards FPGA-based accelerator design,which is achieved by offloading computingintensive kernels to dedicated hardware modules.Furthermore,a customized memory subsystem is also designed to facilitate the decoupling and optimization of computing dominated data access patterns.This paper evaluates the proposed accelerator design strategies by comparing it with a parallelized program on a 4-cores CPU.The CUDA version on a GTX480 shows a speedup of about 6 times.The performance of the stream architecture implemented on a Xilinx Virtex LX330 FPGA is justified by the reported speedup of 2.54 times.Meanwhile,measured in terms of power efficiency,the FPGA-based accelerator outperforms a 4-cores CPU and a GTX480 by 7.3 times and 3.4 times,respectively.展开更多
In modern VLSI technology, hundreds of thousands of arithmetic units fit on a 1cm^2 chip. The challenge is supplying them with instructions and data. Stream architecture is able to solve the problem well. However, the...In modern VLSI technology, hundreds of thousands of arithmetic units fit on a 1cm^2 chip. The challenge is supplying them with instructions and data. Stream architecture is able to solve the problem well. However, the applications suited for typical stream architecture are limited. This paper presents the definition of regular stream and irregular stream, and then describes MASA (Multiple-morphs Adaptive Stream Architecture) prototype system which supports different execution models according to applications' stream characteristics. This paper first discusses MASA architecture and stream model, and then explores the features and advantages of MASA through mapping stream applications to hardware. Finally MASA is evaluated by ten benchmarks. The result is encouraging.展开更多
基金Supported by the National Basic Research Program of China(No.2012CB316502)the National High Technology Research and DevelopmentProgram of China(No.2009AA01A129)the National Natural Science Foundation of China(No.60921002)
文摘The wide acceptance and data deluge in medical imaging processing require faster and more efficient systems to be built.Due to the advances in heterogeneous architectures recently,there has been a resurgence in the first research aimed at FPGA-based as well as GPGPU-based accelerator design.This paper quantitatively analyzes the workload,computational intensity and memory performance of a single-particle 3D reconstruction application,called EMAN,and parallelizes it on CUDA GPGPU architectures and decouples the memory operations from the computing flow and orchestrates the thread-data mapping to reduce the overhead of off-chip memory operations.Then it exploits the trend towards FPGA-based accelerator design,which is achieved by offloading computingintensive kernels to dedicated hardware modules.Furthermore,a customized memory subsystem is also designed to facilitate the decoupling and optimization of computing dominated data access patterns.This paper evaluates the proposed accelerator design strategies by comparing it with a parallelized program on a 4-cores CPU.The CUDA version on a GTX480 shows a speedup of about 6 times.The performance of the stream architecture implemented on a Xilinx Virtex LX330 FPGA is justified by the reported speedup of 2.54 times.Meanwhile,measured in terms of power efficiency,the FPGA-based accelerator outperforms a 4-cores CPU and a GTX480 by 7.3 times and 3.4 times,respectively.
文摘In modern VLSI technology, hundreds of thousands of arithmetic units fit on a 1cm^2 chip. The challenge is supplying them with instructions and data. Stream architecture is able to solve the problem well. However, the applications suited for typical stream architecture are limited. This paper presents the definition of regular stream and irregular stream, and then describes MASA (Multiple-morphs Adaptive Stream Architecture) prototype system which supports different execution models according to applications' stream characteristics. This paper first discusses MASA architecture and stream model, and then explores the features and advantages of MASA through mapping stream applications to hardware. Finally MASA is evaluated by ten benchmarks. The result is encouraging.