期刊文献+

超混合深度可重构计算阵列调度策略的优化研究 被引量:1

STUDY ON SCHEDULING POLICY OPTIMISATION BASED ON HETEROGENEOUS RECONFIGURABLE COMPUTING ARRAY
下载PDF
导出
摘要 针对一种新型的高性能计算机结构:超混合深度可重构计算机阵列(HRCA),提出两个在HRCA上任务分配的调度优化方法。(1)通过算核的优化分配减轻或消除由于算核分配引起的数据通信量急剧增加而导致的"存储墙"问题;(2)通过算粒的调度,将两次迭代间的数据交换与计算时间相重叠,缩短计算部件由数据交换导致的等待时间。以N-body FMM算法为例,验证了两种方法有效地降低了系统对于片外存储访问速度需求,提高了系统的利用率。 Aiming at a novel high performance computer architecture, the heterogeneous reconfigurable computing array ( HRCA ), we propose two optimisation methods for the scheduling of computing tasks allocation on HRCA. ( 1 ) Relieving or eliminating by optimised computing cores allocation the "memory wall" problem caused by the surge of data communication amount due to computing cores allocation; (2) Overlapping the time of computing and data exchanging between two iterations by scheduling task kernels to shorten the idle time on computing components due to data exchange. We use the N-body FMM algorithm as the example and have verified that these two methods effectively reduce the access speed requirements of the system on off-chip memory and improve the utilisation of the system.
出处 《计算机应用与软件》 CSCD 北大核心 2014年第6期278-281,307,共5页 Computer Applications and Software
基金 国家高技术研究发展计划重点项目(2009AA012201) 上海市重点学科建设项目(J50103) 上海大学创新基金
关键词 HRCA FPGA 可重构计算 存储墙 FMM HRCA FPGA Reconfigurable computing Memory wall FMM
  • 相关文献

参考文献12

  • 1Prasanna Sundararajan. High Performance Computing Using FPGAs XILINX White Paper[ OL]. WP375,2010.
  • 2Dimond Rob, Racanière Srbastien, Pell Oliver. Accelerating Large- Scale HPC Applications Using FPGAs[ C]//IEEE 2011. Germany : Proceedings - 2011 20th Symposium on Computer Arithmetic ,2011 : 191 - 192.
  • 3罗兴国,等.PRCA:一种高效能计算体系结构[C]//2012高效能计算机体系结构国际高端论坛,上海,2012,10.
  • 4Xilinx. Virtex-5 Family Overview. Xilinx Product Specification DS100 [OL]. http ://www. xilinx. com/2012.
  • 5Xilinx. 7 Series FPGAs Overview. Xilinx Advance Product Specification DS180[OL]. http://www. xilinx. com/2012.
  • 6余学涛,孔雪,王绪,祝永新,何卫锋,倪明,谢光伟,雷咏梅,单健晨.FMM能效分析及其ASIC可行性评估[J].计算机工程,2011,37(13):265-268. 被引量:1
  • 7李正杰,徐炜民,柴亚辉,郑衍衡.FMM算法中PP问题在GPU上的研究与实现[J].计算机工程与设计,2011,32(9):3050-3053. 被引量:2
  • 8John Hennessy,David Patterson. Computer Architecture: A Quantita- tive Approach[ M ]. 4th ed. Morgan Kaufmann,2006.
  • 9柴亚辉,沈文枫,曹旻,徐炜民,郑衍衡.FPGA加速部件求解FMM算法中PP问题研究[J].福州大学学报(自然科学版),2011,39(4):512-516. 被引量:1
  • 10李琪刚,柴亚辉,徐炜民,郑衍衡.多体问题FMM算法在加速部件FPGA研究与实现[J].计算机工程与设计,2011,32(10):3391-3394. 被引量:4

二级参考文献62

  • 1赖国明,杨圣云,袁德辉.FMM算法的并行化方法[J].计算机应用与软件,2007,24(7):176-178. 被引量:2
  • 2Florin Diacu.The solution of the n-body problem[J]. Mathematical Intelligencer, 1996,18(3):66-70.
  • 3Guy Blelloch, Girija Narlikar.A practical comparison of N-body algorithms[C]. American:Parallel Algorithms,Series in Discrete Mathematics and Theoretical Computer Science,1997.
  • 4Barnes J,Hut P.A hierarchical O(N log N) force-calculation algorithm[J].Nature, 1986,324(6096):446-449.
  • 5Greengard L,Rokhlin V.A fast algorithm for particle simulations [J].Journal of Computational Physics,1987,73(2):325-348.
  • 6Simon Portegies Zwart,Robert Belleman,Peter Geldof.High performance direct gravitational N-body simulations on graphics processing unit I:an implementation in Cg[J].New Astronomy, 2007,12(8):641-650.http://arxiv.org/abs/astro-ph/0702058.
  • 7Tsuyoshi Hamada, Toshiaki Iitaka. The chamomile scheme: An optimized algorithm for N-body simulations on programmable graphics processing its[DB/OL], http: //arxiv.org/abs/astro-ph/ 0703100,2007-03 -06/2010-07-20.
  • 8Lars Nyland.Fast N-body simulation with cuda[J].GPU Gems, 2007(3):677-695.
  • 9Robert G Bellemana,Jeroen Badorfa, Simon F Portegies Zwart. High performance direct gravitational N-body simulations on graphics processing units II: An implementation in CUDA [J]. New Astronomy,2008,13(2): 103-112.
  • 10Mark J Stock.Toward efficient GPU-accelerated N-body simulations[C].American:46th AIAA Aerospace Sciences Meeting and Exhibit,2008:1-13.

共引文献5

同被引文献7

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部