异构多核平台下基础数学库寄存器分配方法被引量：2

Register allocation in base mathematics library for platform of heterogeneous multi-core

下载PDF

导出

摘要针对异构多核处理器协处理器数学函数中由于查表法和寄存器资源不足而导致的性能下降问题,提出一种基于热路径的寄存器分配方法,结合数学函数的相关路径特点和两类寄存器资源使用开销不一致的情况,对热路径和较少使用路径上的寄存器资源进行再分配过程,将较少使用路径上的高效寄存器资源与热路径上的耗时寄存器资源进行交换,最大限度消除热路径上的访存过程,以降低函数较少使用路径上的性能为代价从而提高函数热路径上的性能,从而达到提升函数整体性能的目的。实际数据表明,上述手段能够使协处理器典型数学函数的性能提升18%以上,从而有效发挥协处理器的计算性能。 Aiming at the performance problem caused by look-up table and lacking of registers in the co-processor mathematics library of heterogeneous multi-core, in view of the feature of mathematics path and the condition that the cost of two kinds of registers were different, the paper propounded a register-allocation optimization algorithm based on hot-path which reallocated the register resource on the hot-path and less-used path which exchanged the high-cost registers on the hot-path with the low-cost registers on the less-used path. The access memory process on the hot-paths was eliminated and a obvious improvement was gained on the hot-path with a performance fall of the less-used path. The real data shows that the above- mentioned way can gain a obvious performance improvement of over 18% for typical functions, so the co-proeessor＇s performanee can be improved.

作者郭正红郭绍忠许瑾晨张兆天

机构地区中国洛阳电子装备试验中心信息工程大学信息工程学院信息工程大学数字工程与先进计算国家重点实验室

出处《计算机应用》 CSCD 北大核心 2014年第A01期86-89,共4页 journal of Computer Applications

关键词异构多核查表寄存器热路径数学库 heterogeneous multi-core look-up table register hot-path mathematics library

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献12

1KHAN S. Improving multi-core performance using mixed-cell cache architecture[ C]// Proceedings of High Performance Computer Ar- chitecture. Washington, DC: IEEE Computer Society, 2013: 119- 130.
2DARAMY C, DEFOUR D, de DINECHIN F, et al. CR-LIRM: a correctly rounded elementary function library[ C]// Proceedings of SPIE 5295. Berllingham: SPIE, 2003:193 -201.
3PING T, PETER T. A portable generic elementary function package in Ada and an accurate test suite[ C]//ACM SIGAda Annual Inter- national Conference. Berlin: Springer-Verlag, 1991:521-529.
4RIDEAU S, XAVIER L. Validating register allocation and spilling [ C]// Proceedings of the 19th International Conference on Compiler Construction. Berlin: Springer-Verlag, 2010:1245 - 1252.
5DING H G. A design implementation of decimal floating-point multi- plication unit based on SOPC[ C]// Proceedings of the 2012 Third International Conference on Digital Manufacturing & Automation. Washington, DC: IEEE Computer Society, 2012:324-329.
6裘巍.编译器设计之路[M].北京:机械工业出版社,2010.
7POLETFO M, SARKAR V. Linear scan register allocation [ J]. ACM Transactions on Programming Languages and Systems, 1999, 21(5) : 895 -913.
8SANDRINE B, BENOIT R. Formal verification of coalescing graph- coloring register allocation[ C]//Proceedings of the European Sym- posium on Programming. New York: ACM, 2010:859 -865.
9SUBHA S. A modified linear scan register allocation algorithm [ C]//Proceedings of the Sixth International Conference on Infor- mation Technology. Berlin: Springer-Verlag, 2009:452-461.
10MATTHIAS B, CHRISTOPH M. Preference-guided register assign- ment[ C]// Proceedings of the 2010 International Compiler Con- struction Conference. New York: ACM, 2010:398-403.

二级参考文献11

1LIAO C H. A compile-time OpenMP cost model[D]. Houston: Uni-versity of Houston, 2007.
2TRIFUNOVIC K, NUZMAN D, COHEN A, et al. Polyhedral-mod-el guided loop-nest auto-vectorization[C] / / Proceedings of the ISth International Conference on Parallel Architectures and Compilation Techniques. Washington, DC: IEEE Computer Society, 2009: 327 -337.
3BONDHUGULA U, GUNLUK 0, DASH S, et al. A model for fu-sion and code motion in an automatic parallelizing compiler[C] / / Proceedings of the 19th International Conference on Parallel Archi-tectures and Compilation Techniques. Washington, DC: IEEE Com-puter Society, 2010: 343 - 352.
4SHARAPOV I, KROEGER R, DELAMATER G, et al. A case study in top-down performance estimation for a large-scale parallel application[C] / / Proceedings of the 11 th ACM SIGPLAN Symposi-um on Principles and Practice of Parallel Programming. New York: ACM,2006:SI-S9.
5CONG J, YUAN B. Energy-efficient scheduling on heterogeneous multi-core architecture[C] / / Proceedings of the 2012 ACMlIEEE International Symposium on Low Power Electronics and Design. New York: ACM, 2012: 345 - 350.
6CHEN T, RAGHAVAN R, DALE J N, et al. Cell broadband en-gine architecture and its first implementation - a performance view[J]. IBM Journal of Research and Development, 2007, 51 ( 5): 559 -572.
7SKOVHEDE K, LARSEN M N, VINTER B. Extending distributed shared memory for the cell broadband engine to a channel model[C] / / Proceedings of the 10th International Conference on Applied Parallel and Scientific Computing. Berlin: Springer-Verlag, 2012, 7133: 108 -llS.
8UJVAL J K, RlXNER S, WILLIAN J D, et al. Programmable stream processors[J]. Computer, 2003, 36( S): 54 - 62.
9KINDRATENKO V V. Novel computing architecture[J]. Computing in Science & Engineering, 2009, 1l(3): 54 -57.
10BLAGOJEVIC F, FENG X Z, CAMERON K W, et al, Modeling multigrain parallelism on heterogeneous multi-core processors: a case study of the cell BE[C] / / Proceedings of the 2008 Internation-al Conference on High-Performance Embedded Architectures and Computers. Berlin: Springer, 2008: 38 - 52.

共引文献3

1郭正红,郭绍忠.基础数学库中的层次结构寄存器分配策略[J].计算机工程,2012,38(24):266-268. 被引量：5
2李雁冰,赵荣彩,韩林,赵捷,徐金龙,李颖颖.一种面向异构众核处理器的并行编译框架[J].软件学报,2019,30(4):981-1001. 被引量：7
3曲海成,于思淼,刘万军,王鑫源.面向CUDA程序的性能预测框架[J].电子学报,2020,48(4):654-661.

同被引文献17

1顾乃杰,李凯,陈国良,吴超.基于龙芯2F体系结构的BLAS库优化[J].中国科学技术大学学报,2008,38(7):854-859. 被引量：13
2郭正红,郭绍忠.基础数学库中的层次结构寄存器分配策略[J].计算机工程,2012,38(24):266-268. 被引量：5
3郭绍忠,郭正红,王磊.基础数学库中的MCET寄存器分配方法[J].计算机应用与软件,2013,30(3):291-293. 被引量：2
4许瑾晨,郭绍忠,黄永忠,王磊.面向异构众核从核的数学函数库访存优化方法[J].计算机科学,2014,41(6):12-17. 被引量：6
5许瑾晨,黄永忠,郭绍忠,周蓓,赵捷.一个浮点数学函数库测试平台[J].软件学报,2015,26(6):1306-1321. 被引量：12
6许瑾晨,郭绍忠,黄永忠,王磊,周蓓.浮点数学函数异常处理方法[J].软件学报,2015,26(12):3088-3103. 被引量：7
7刘昊,刘芳芳,张鹏,杨超,蒋丽娟.基于申威1600的3级BLAS GEMM函数优化[J].计算机系统应用,2016,25(12):234-239. 被引量：10
8曹代,郭绍忠,张辛.基于申威26010处理器的扩展函数库实现与优化[J].计算机工程,2017,43(1):61-66. 被引量：10
9曹代,郭绍忠,张辛.某国产平台数学库优化技术研究[J].信息工程大学学报,2017,18(4):470-474. 被引量：5
10孙家栋,孙乔,邓攀,杨超.基于申威众核处理器的1、2级BLAS函数优化研究[J].计算机系统应用,2017,26(11):101-108. 被引量：5

引证文献2

1吴凡,王磊.基于申威1621函数库的断流水指令替换方法[J].计算机系统应用,2021,30(7):165-171.
2蔡雨,孙成国,杜朝晖,刘子行,康梦博,李双双.异构HPL算法中CPU端高性能BLAS库优化[J].软件学报,2021,32(8):2289-2306. 被引量：2

二级引证文献2

1周雍浩,徐金龙,李斌,钱宏,聂凯.面向神威高性能多核处理器的并行编译优化方法[J].计算机工程,2022,48(9):130-138. 被引量：1
2唐昊文,邹宏涛.统一PLC数据通信系统Bridge的设计与实现[J].电脑与信息技术,2022,30(5):56-60. 被引量：2

1郑岩,王星焱,殷红武.Barrelfish：一种支持异构多核平台的操作系统[J].高性能计算技术,2013,0(1):25-29.
2赵月爱,侯鹏程,王玲,韩素青.基于网络处理器的高性能入侵防护系统研究[J].太原师范学院学报（自然科学版）,2014,13(1):62-65. 被引量：1
3王超,陈香兰,周学海,王爱立.异构多核平台上基于任务划分和调度的性能评估方法[J].中国科学院研究生院学报,2012,29(2):257-263. 被引量：3
4顾玉磊,朱雪阳,晏荣杰,张广泉.基于异构多核平台的同步数据流图帕累托优化与调度[J].计算机科学,2015,42(11):43-47. 被引量：3
5房双德,杜子东,方运潭,黄元杰,李华伟,陈云霁,吴承勇.面向低能耗的非精确异构多核上的运行时技术[J].高技术通讯,2014,24(8):791-799. 被引量：1
6徐成,王培磊,杨志邦.基于改进蚁群算法的周期多帧任务分配[J].计算机应用研究,2012,29(9):3251-3254. 被引量：2
7缪巍巍,吴海洋,施健,吕顺利.基于改进蚁群算法的通信业务智能调配分析研究[J].计算机与数字工程,2017,45(1):38-42. 被引量：5
8彭一准,姜小宝,庄明加,保和平.一种基于80C51单片机控制的寻迹小车设计[J].天津科技大学学报,2011,26(1):55-59. 被引量：4
9周亦敏,沈云龙,曹丽东.基于异构多核平台H.264解码的DVFS算法[J].计算机工程,2013,39(11):268-271. 被引量：4
10金立生,王荣本,郭烈,纪寿文.采用补偿模糊神经网络识别雨天导航路径的方法研究[J].公路交通科技,2005,22(10):110-113. 被引量：2

计算机应用

2014年第A01期

浏览历史

内容加载中请稍等...

异构多核平台下基础数学库寄存器分配方法被引量：2

参考文献12

二级参考文献11

共引文献3

同被引文献17

引证文献2

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

异构多核平台下基础数学库寄存器分配方法 被引量：2

参考文献12

二级参考文献11

共引文献3

同被引文献17

引证文献2

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

异构多核平台下基础数学库寄存器分配方法被引量：2