期刊文献+

异构多核平台下基础数学库寄存器分配方法 被引量:2

Register allocation in base mathematics library for platform of heterogeneous multi-core
下载PDF
导出
摘要 针对异构多核处理器协处理器数学函数中由于查表法和寄存器资源不足而导致的性能下降问题,提出一种基于热路径的寄存器分配方法,结合数学函数的相关路径特点和两类寄存器资源使用开销不一致的情况,对热路径和较少使用路径上的寄存器资源进行再分配过程,将较少使用路径上的高效寄存器资源与热路径上的耗时寄存器资源进行交换,最大限度消除热路径上的访存过程,以降低函数较少使用路径上的性能为代价从而提高函数热路径上的性能,从而达到提升函数整体性能的目的。实际数据表明,上述手段能够使协处理器典型数学函数的性能提升18%以上,从而有效发挥协处理器的计算性能。 Aiming at the performance problem caused by look-up table and lacking of registers in the co-processor mathematics library of heterogeneous multi-core, in view of the feature of mathematics path and the condition that the cost of two kinds of registers were different, the paper propounded a register-allocation optimization algorithm based on hot-path which reallocated the register resource on the hot-path and less-used path which exchanged the high-cost registers on the hot-path with the low-cost registers on the less-used path. The access memory process on the hot-paths was eliminated and a obvious improvement was gained on the hot-path with a performance fall of the less-used path. The real data shows that the above- mentioned way can gain a obvious performance improvement of over 18% for typical functions, so the co-proeessor's performanee can be improved.
出处 《计算机应用》 CSCD 北大核心 2014年第A01期86-89,共4页 journal of Computer Applications
关键词 异构多核 查表 寄存器 热路径 数学库 heterogeneous multi-core look-up table register hot-path mathematics library
  • 相关文献

参考文献12

  • 1KHAN S. Improving multi-core performance using mixed-cell cache architecture[ C]// Proceedings of High Performance Computer Ar- chitecture. Washington, DC: IEEE Computer Society, 2013: 119- 130.
  • 2DARAMY C, DEFOUR D, de DINECHIN F, et al. CR-LIRM: a correctly rounded elementary function library[ C]// Proceedings of SPIE 5295. Berllingham: SPIE, 2003:193 -201.
  • 3PING T, PETER T. A portable generic elementary function package in Ada and an accurate test suite[ C]//ACM SIGAda Annual Inter- national Conference. Berlin: Springer-Verlag, 1991:521-529.
  • 4RIDEAU S, XAVIER L. Validating register allocation and spilling [ C]// Proceedings of the 19th International Conference on Compiler Construction. Berlin: Springer-Verlag, 2010:1245 - 1252.
  • 5DING H G. A design implementation of decimal floating-point multi- plication unit based on SOPC[ C]// Proceedings of the 2012 Third International Conference on Digital Manufacturing & Automation. Washington, DC: IEEE Computer Society, 2012:324-329.
  • 6裘巍.编译器设计之路[M].北京:机械工业出版社,2010.
  • 7POLETFO M, SARKAR V. Linear scan register allocation [ J]. ACM Transactions on Programming Languages and Systems, 1999, 21(5) : 895 -913.
  • 8SANDRINE B, BENOIT R. Formal verification of coalescing graph- coloring register allocation[ C]//Proceedings of the European Sym- posium on Programming. New York: ACM, 2010:859 -865.
  • 9SUBHA S. A modified linear scan register allocation algorithm [ C]//Proceedings of the Sixth International Conference on Infor- mation Technology. Berlin: Springer-Verlag, 2009:452-461.
  • 10MATTHIAS B, CHRISTOPH M. Preference-guided register assign- ment[ C]// Proceedings of the 2010 International Compiler Con- struction Conference. New York: ACM, 2010:398-403.

二级参考文献11

  • 1LIAO C H. A compile-time OpenMP cost model[D]. Houston: Uni-versity of Houston, 2007.
  • 2TRIFUNOVIC K, NUZMAN D, COHEN A, et al. Polyhedral-mod-el guided loop-nest auto-vectorization[C] / / Proceedings of the ISth International Conference on Parallel Architectures and Compilation Techniques. Washington, DC: IEEE Computer Society, 2009: 327 -337.
  • 3BONDHUGULA U, GUNLUK 0, DASH S, et al. A model for fu-sion and code motion in an automatic parallelizing compiler[C] / / Proceedings of the 19th International Conference on Parallel Archi-tectures and Compilation Techniques. Washington, DC: IEEE Com-puter Society, 2010: 343 - 352.
  • 4SHARAPOV I, KROEGER R, DELAMATER G, et al. A case study in top-down performance estimation for a large-scale parallel application[C] / / Proceedings of the 11 th ACM SIGPLAN Symposi-um on Principles and Practice of Parallel Programming. New York: ACM,2006:SI-S9.
  • 5CONG J, YUAN B. Energy-efficient scheduling on heterogeneous multi-core architecture[C] / / Proceedings of the 2012 ACMlIEEE International Symposium on Low Power Electronics and Design. New York: ACM, 2012: 345 - 350.
  • 6CHEN T, RAGHAVAN R, DALE J N, et al. Cell broadband en-gine architecture and its first implementation - a performance view[J]. IBM Journal of Research and Development, 2007, 51 ( 5): 559 -572.
  • 7SKOVHEDE K, LARSEN M N, VINTER B. Extending distributed shared memory for the cell broadband engine to a channel model[C] / / Proceedings of the 10th International Conference on Applied Parallel and Scientific Computing. Berlin: Springer-Verlag, 2012, 7133: 108 -llS.
  • 8UJVAL J K, RlXNER S, WILLIAN J D, et al. Programmable stream processors[J]. Computer, 2003, 36( S): 54 - 62.
  • 9KINDRATENKO V V. Novel computing architecture[J]. Computing in Science & Engineering, 2009, 1l(3): 54 -57.
  • 10BLAGOJEVIC F, FENG X Z, CAMERON K W, et al, Modeling multigrain parallelism on heterogeneous multi-core processors: a case study of the cell BE[C] / / Proceedings of the 2008 Internation-al Conference on High-Performance Embedded Architectures and Computers. Berlin: Springer, 2008: 38 - 52.

共引文献3

同被引文献17

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部