期刊文献+

面向异构多核处理器的并行代价模型 被引量:3

Parallel cost model for heterogeneous multi-core processors
下载PDF
导出
摘要 现有的并行代价模型大多是面向共享存储或分布存储结构设计的,不完全适合异构多核处理器。为解决这个问题,提出了面向异构多核处理器的并行代价模型,通过定量刻画计算核心运算能力、存储访问延迟和数据传输开销对循环并行执行时间的影响,提高加速并行循环识别的准确性。实验结果表明,提出的并行代价模型能有效识别加速并行循环,将其识别结果作为后端生成并行代码的依据,可有效提高并行程序在异构多核处理器上的性能。 The existing parallel cost models are mostly devised for shared memory or distributed memory architecture, thus not suitable for heterogeneous multi-core processors. In order to solve the problem, a new parallel cost model for heterogeneous multi-cores was proposed. It described the impact of computing capacity, memory access delay and data transfer cost on parallel execution time of loops quantitatively, thus improving the veracity of accelerated parallel loop recognition. The experimental results show that the proposed model can effectively recognize the accelerated parallel loops. Using its recognition results to generate parallel codes can improve the performance of parallel programs on heterogeneous multi-core processors significantly.
出处 《计算机应用》 CSCD 北大核心 2013年第6期1544-1547,共4页 journal of Computer Applications
基金 国家"核高基"重大专项(2009ZX01036-001-001-2)
关键词 自动并行化 并行代价模型 异构多核 数据传输开销 加速并行循环 auto-parallelization parallel cost model heterogeneous multi-core data transfer cost accelerated parallel loop
  • 相关文献

参考文献11

  • 1LIAO C H. A compile-time OpenMP cost model[D]. Houston: Uni-versity of Houston, 2007.
  • 2TRIFUNOVIC K, NUZMAN D, COHEN A, et al. Polyhedral-mod-el guided loop-nest auto-vectorization[C] / / Proceedings of the ISth International Conference on Parallel Architectures and Compilation Techniques. Washington, DC: IEEE Computer Society, 2009: 327 -337.
  • 3BONDHUGULA U, GUNLUK 0, DASH S, et al. A model for fu-sion and code motion in an automatic parallelizing compiler[C] / / Proceedings of the 19th International Conference on Parallel Archi-tectures and Compilation Techniques. Washington, DC: IEEE Com-puter Society, 2010: 343 - 352.
  • 4SHARAPOV I, KROEGER R, DELAMATER G, et al. A case study in top-down performance estimation for a large-scale parallel application[C] / / Proceedings of the 11 th ACM SIGPLAN Symposi-um on Principles and Practice of Parallel Programming. New York: ACM,2006:SI-S9.
  • 5CONG J, YUAN B. Energy-efficient scheduling on heterogeneous multi-core architecture[C] / / Proceedings of the 2012 ACMlIEEE International Symposium on Low Power Electronics and Design. New York: ACM, 2012: 345 - 350.
  • 6CHEN T, RAGHAVAN R, DALE J N, et al. Cell broadband en-gine architecture and its first implementation - a performance view[J]. IBM Journal of Research and Development, 2007, 51 ( 5): 559 -572.
  • 7SKOVHEDE K, LARSEN M N, VINTER B. Extending distributed shared memory for the cell broadband engine to a channel model[C] / / Proceedings of the 10th International Conference on Applied Parallel and Scientific Computing. Berlin: Springer-Verlag, 2012, 7133: 108 -llS.
  • 8UJVAL J K, RlXNER S, WILLIAN J D, et al. Programmable stream processors[J]. Computer, 2003, 36( S): 54 - 62.
  • 9KINDRATENKO V V. Novel computing architecture[J]. Computing in Science & Engineering, 2009, 1l(3): 54 -57.
  • 10BLAGOJEVIC F, FENG X Z, CAMERON K W, et al, Modeling multigrain parallelism on heterogeneous multi-core processors: a case study of the cell BE[C] / / Proceedings of the 2008 Internation-al Conference on High-Performance Embedded Architectures and Computers. Berlin: Springer, 2008: 38 - 52.

同被引文献22

  • 1裘巍.编译器设计之路[M].北京:机械工业出版社,2010.
  • 2KHAN S. Improving multi-core performance using mixed-cell cache architecture[ C]// Proceedings of High Performance Computer Ar- chitecture. Washington, DC: IEEE Computer Society, 2013: 119- 130.
  • 3DARAMY C, DEFOUR D, de DINECHIN F, et al. CR-LIRM: a correctly rounded elementary function library[ C]// Proceedings of SPIE 5295. Berllingham: SPIE, 2003:193 -201.
  • 4PING T, PETER T. A portable generic elementary function package in Ada and an accurate test suite[ C]//ACM SIGAda Annual Inter- national Conference. Berlin: Springer-Verlag, 1991:521-529.
  • 5RIDEAU S, XAVIER L. Validating register allocation and spilling [ C]// Proceedings of the 19th International Conference on Compiler Construction. Berlin: Springer-Verlag, 2010:1245 - 1252.
  • 6DING H G. A design implementation of decimal floating-point multi- plication unit based on SOPC[ C]// Proceedings of the 2012 Third International Conference on Digital Manufacturing & Automation. Washington, DC: IEEE Computer Society, 2012:324-329.
  • 7POLETFO M, SARKAR V. Linear scan register allocation [ J]. ACM Transactions on Programming Languages and Systems, 1999, 21(5) : 895 -913.
  • 8SANDRINE B, BENOIT R. Formal verification of coalescing graph- coloring register allocation[ C]//Proceedings of the European Sym- posium on Programming. New York: ACM, 2010:859 -865.
  • 9SUBHA S. A modified linear scan register allocation algorithm [ C]//Proceedings of the Sixth International Conference on Infor- mation Technology. Berlin: Springer-Verlag, 2009:452-461.
  • 10MATTHIAS B, CHRISTOPH M. Preference-guided register assign- ment[ C]// Proceedings of the 2010 International Compiler Con- struction Conference. New York: ACM, 2010:398-403.

引证文献3

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部