面向异构多核处理器的并行代价模型被引量：3

Parallel cost model for heterogeneous multi-core processors

下载PDF

导出

摘要现有的并行代价模型大多是面向共享存储或分布存储结构设计的,不完全适合异构多核处理器。为解决这个问题,提出了面向异构多核处理器的并行代价模型,通过定量刻画计算核心运算能力、存储访问延迟和数据传输开销对循环并行执行时间的影响,提高加速并行循环识别的准确性。实验结果表明,提出的并行代价模型能有效识别加速并行循环,将其识别结果作为后端生成并行代码的依据,可有效提高并行程序在异构多核处理器上的性能。 The existing parallel cost models are mostly devised for shared memory or distributed memory architecture, thus not suitable for heterogeneous multi-core processors. In order to solve the problem, a new parallel cost model for heterogeneous multi-cores was proposed. It described the impact of computing capacity, memory access delay and data transfer cost on parallel execution time of loops quantitatively, thus improving the veracity of accelerated parallel loop recognition. The experimental results show that the proposed model can effectively recognize the accelerated parallel loops. Using its recognition results to generate parallel codes can improve the performance of parallel programs on heterogeneous multi-core processors significantly.

作者黄品丰赵荣彩姚远赵捷

机构地区信息工程大学数字工程与先进计算国家重点实验室

出处《计算机应用》 CSCD 北大核心 2013年第6期1544-1547,共4页 journal of Computer Applications

基金国家"核高基"重大专项(2009ZX01036-001-001-2)

关键词自动并行化并行代价模型异构多核数据传输开销加速并行循环 auto-parallelization parallel cost model heterogeneous multi-core data transfer cost accelerated parallel loop

分类号 TP314 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献11

1LIAO C H. A compile-time OpenMP cost model[D]. Houston: Uni-versity of Houston, 2007.
2TRIFUNOVIC K, NUZMAN D, COHEN A, et al. Polyhedral-mod-el guided loop-nest auto-vectorization[C] / / Proceedings of the ISth International Conference on Parallel Architectures and Compilation Techniques. Washington, DC: IEEE Computer Society, 2009: 327 -337.
3BONDHUGULA U, GUNLUK 0, DASH S, et al. A model for fu-sion and code motion in an automatic parallelizing compiler[C] / / Proceedings of the 19th International Conference on Parallel Archi-tectures and Compilation Techniques. Washington, DC: IEEE Com-puter Society, 2010: 343 - 352.
4SHARAPOV I, KROEGER R, DELAMATER G, et al. A case study in top-down performance estimation for a large-scale parallel application[C] / / Proceedings of the 11 th ACM SIGPLAN Symposi-um on Principles and Practice of Parallel Programming. New York: ACM,2006:SI-S9.
5CONG J, YUAN B. Energy-efficient scheduling on heterogeneous multi-core architecture[C] / / Proceedings of the 2012 ACMlIEEE International Symposium on Low Power Electronics and Design. New York: ACM, 2012: 345 - 350.
6CHEN T, RAGHAVAN R, DALE J N, et al. Cell broadband en-gine architecture and its first implementation - a performance view[J]. IBM Journal of Research and Development, 2007, 51 ( 5): 559 -572.
7SKOVHEDE K, LARSEN M N, VINTER B. Extending distributed shared memory for the cell broadband engine to a channel model[C] / / Proceedings of the 10th International Conference on Applied Parallel and Scientific Computing. Berlin: Springer-Verlag, 2012, 7133: 108 -llS.
8UJVAL J K, RlXNER S, WILLIAN J D, et al. Programmable stream processors[J]. Computer, 2003, 36( S): 54 - 62.
9KINDRATENKO V V. Novel computing architecture[J]. Computing in Science & Engineering, 2009, 1l(3): 54 -57.
10BLAGOJEVIC F, FENG X Z, CAMERON K W, et al, Modeling multigrain parallelism on heterogeneous multi-core processors: a case study of the cell BE[C] / / Proceedings of the 2008 Internation-al Conference on High-Performance Embedded Architectures and Computers. Berlin: Springer, 2008: 38 - 52.

同被引文献22

1裘巍.编译器设计之路[M].北京:机械工业出版社,2010.
2KHAN S. Improving multi-core performance using mixed-cell cache architecture[ C]// Proceedings of High Performance Computer Ar- chitecture. Washington, DC: IEEE Computer Society, 2013: 119- 130.
3DARAMY C, DEFOUR D, de DINECHIN F, et al. CR-LIRM: a correctly rounded elementary function library[ C]// Proceedings of SPIE 5295. Berllingham: SPIE, 2003:193 -201.
4PING T, PETER T. A portable generic elementary function package in Ada and an accurate test suite[ C]//ACM SIGAda Annual Inter- national Conference. Berlin: Springer-Verlag, 1991:521-529.
5RIDEAU S, XAVIER L. Validating register allocation and spilling [ C]// Proceedings of the 19th International Conference on Compiler Construction. Berlin: Springer-Verlag, 2010:1245 - 1252.
6DING H G. A design implementation of decimal floating-point multi- plication unit based on SOPC[ C]// Proceedings of the 2012 Third International Conference on Digital Manufacturing & Automation. Washington, DC: IEEE Computer Society, 2012:324-329.
7POLETFO M, SARKAR V. Linear scan register allocation [ J]. ACM Transactions on Programming Languages and Systems, 1999, 21(5) : 895 -913.
8SANDRINE B, BENOIT R. Formal verification of coalescing graph- coloring register allocation[ C]//Proceedings of the European Sym- posium on Programming. New York: ACM, 2010:859 -865.
9SUBHA S. A modified linear scan register allocation algorithm [ C]//Proceedings of the Sixth International Conference on Infor- mation Technology. Berlin: Springer-Verlag, 2009:452-461.
10MATTHIAS B, CHRISTOPH M. Preference-guided register assign- ment[ C]// Proceedings of the 2010 International Compiler Con- struction Conference. New York: ACM, 2010:398-403.

引证文献3

1郭正红,郭绍忠,许瑾晨,张兆天.异构多核平台下基础数学库寄存器分配方法[J].计算机应用,2014,34(A01):86-89. 被引量：2
2李雁冰,赵荣彩,韩林,赵捷,徐金龙,李颖颖.一种面向异构众核处理器的并行编译框架[J].软件学报,2019,30(4):981-1001. 被引量：6
3曲海成,于思淼,刘万军,王鑫源.面向CUDA程序的性能预测框架[J].电子学报,2020,48(4):654-661.

二级引证文献8

1董本松,赵荣彩,张恒.基于申威众核处理器的Office口令恢复技术[J].计算机技术与发展,2021,31(5):137-142. 被引量：1
2俞茂学,贾东宁,魏志强,许佳立,马广浩.一种基于国产异构众核处理器的C++智能源码转换框架[J].计算机工程与科学,2021,43(6):997-1005. 被引量：3
3吴凡,王磊.基于申威1621函数库的断流水指令替换方法[J].计算机系统应用,2021,30(7):165-171.
4蔡雨,孙成国,杜朝晖,刘子行,康梦博,李双双.异构HPL算法中CPU端高性能BLAS库优化[J].软件学报,2021,32(8):2289-2306. 被引量：2
5邬江兴,祁晓峰,高彦钊.异构计算并行编程模型综述[J].上海航天（中英文）,2021,38(4):1-11. 被引量：5
6刘金硕,黄朔,邓娟.面向PMVS算法的自动两级并行翻译方法[J].计算机工程,2022,48(12):16-23.
7喻高远,楼云锋,李俊杰,金先龙.申威异构众核处理器架构下结构瞬态有限元并行算法[J].振动与冲击,2023,42(6):152-158.
8陶小涵,朱雨,庞建民,赵捷,徐金龙.面向申威异构架构的并行代码自动生成[J].软件学报,2023,34(4):1570-1593. 被引量：3

1王振宇,郭福顺.循环并行的优化技术[J].深圳大学学报（理工版）,1994,11(3):25-30. 被引量：1
2王振宇,王义和,郭福顺.并行循环的识别[J].哈尔滨工业大学学报,1992,24(1):40-46.
3胡吉明,周建强.并行循环的乐观自调度模式[J].计算机学报,1995,18(1):46-55.
4多路64位英特尔~至强~服务器集群为并行数据库系统提供高性价比硬件平台——全芯全力加速并行数据库[J].互联网周刊,2006(27):51-51.
5黄品丰,赵荣彩,韩林,刘晓娴.OpenMP数据分布子句自动生成算法[J].计算机工程,2013,39(3):295-299.
6林超.基于C++ AMP加速并行蚁群算法[J].现代电子技术,2014,37(23):69-71. 被引量：3
7余德汝,王石刚.六轮车扭矩加载系统的应用程序框架研究[J].测控技术,2015,34(4):112-115.
8沈志宇.多机系统的并行循环调度[J].计算机工程与科学,1989,11(2):1-12.
9黄其军,杨建武,余华山,许卓群.基于规范划分集的并行循环计算划分[J].软件学报,2003,14(3):362-368. 被引量：1
10胡永刚,乔如良.一个有效的并行分析算法[J].计算机学报,1999,22(2):134-140. 被引量：3

计算机应用

2013年第6期

浏览历史

内容加载中请稍等...

面向异构多核处理器的并行代价模型被引量：3

参考文献11

同被引文献22

引证文献3

二级引证文献8

相关作者

相关机构

相关主题

浏览历史

面向异构多核处理器的并行代价模型 被引量：3

参考文献11

同被引文献22

引证文献3

二级引证文献8

相关作者

相关机构

相关主题

浏览历史

面向异构多核处理器的并行代价模型被引量：3