期刊文献+

申威众核处理器访存与通信融合编译优化

Memory Access and Communication Fusion Compiler Optimization for Sunway Many-core Processors
下载PDF
导出
摘要 申威众核片上多级存储层次是缓解众核“访存墙”的重要结构.完全由软件管理的SPM结构和片上RMA通信机制给应用性能提升带来很多机会,但也给应用程序开发优化与移植提出了很大挑战.为充分挖掘片上存储层次特点提升应用程序性能,同时减轻用户编程优化负担,提出一种多级存储层次访存与通信融合的编译优化方法.该方法首先设计融合编译指示,将程序高层信息传递给编译器.其次构建编译优化收益模型并设计启发式循环优化方案迭代求解框架,并由编译器完成循环优化方案的求解和优化代码的变换.通过编译生成的DMA和RMA批量数据传输操作,将较低存储层次空间中高访问延迟的核心数据批量缓冲进低访问延迟的更高存储层次空间中.在3个典型测试用例上进行优化实验测试与分析,结果表明所提出的优化在性能上与手工优化相当,较未优化版程序性能有显著提升. The on-chip memory hierarchy of Sunway many-core processors is an important structure to alleviate the many-core“memory access wall”.The SPM structure and on-chip RMA communication mechanism completely managed by software bring many opportunities for improving application performance but also pose great challenges for development optimization and porting of applications.To fully explore the hierarchical features of on-chip memory,improve application performance,and reduce the burden of user programming optimization,this study proposes a compiler optimization method that integrates multi-level memory access and communication.This method first designs a fusion compiler directive to transfer high-level information of the program to the compiler.Secondly,a compiler optimization revenue model is built and an iterative solution framework of a heuristic loop optimization scheme is designed.Meanwhile,the compiler completes the solution and code transformation of the loop optimization scheme.DMA and RMA batch data transmission operations are generated by compilation,batch buffer core data with high access latency from lower storage hierarchy spaces into higher storage hierarchy spaces with low access latency.Optimization experiments and analysis are conducted on three typical test cases,and the results show that the program performance optimized by this method is comparable to manual optimization,and significantly improves compared to the unoptimized version.
作者 方燕飞 李雁冰 董恩铭 王云飞 刘齐 FANG Yan-Fei;LI Yan-Bing;DONG En-Ming;WANG Yun-Fei;LIU Qi(National Research Center of Parallel Computer Engineering and Technology,Beijing 100190,China)
出处 《软件学报》 EI CSCD 北大核心 2024年第6期2648-2667,共20页 Journal of Software
基金 先进计算与智能工程(国家级)实验室基金 国家重点研发计划重点专项(2021YFB0301100)。
关键词 申威众核处理器 多级存储层次 RMA通信 并行语言 编译优化 Sunway many-core processor multi-level memory hierarchy RMA communication parallel language compiler optimization
  • 相关文献

参考文献6

二级参考文献41

  • 1王欢,都志辉.并行计算模型对比分析[J].计算机科学,2005,32(12):142-145. 被引量:7
  • 2O'Brien Kevin,O'Brien Kathryn,Sura Z,et al.Supporting OpenMP on Cell[C] //Proc of the 3rd Int Workshop on OpenMP.Berlin:Springer,2007:65-76.
  • 3Liu Zhenying,Chapman B M,Weng T H,et al.Improving the performance of OpenMP by array privatization[G] //LNCS 2716:Proc of the OpenMP Applications and Tools.Berlin:Springer,2003:244-259.
  • 4Liu Zhenying,Chapman B M,Wen Yi,et al.Analyses for the translation of openMP codes into SPMD style with array privatization[G] //LNCS 2716:Proc of the OpenMP Applications and Tools.Berlin:Springer,2003:26-41.
  • 5Tu P,Padua D.Automatic array privatization[G] //LNCS 1808:Proc of the 6th Int Workshop Languages and Compilers for Parallel Computing.Berlin:Springer,1994:500-521.
  • 6Jin Haoqiang,Chapman B M,Huang Lei,et al.Performance evaluation of a multi-zone application in different OpenMP approaches[J].International Journal of Parallel Programming,2008,36(3):312-325.
  • 7Huang Lei,Sethuraman G,Chapman B M.Parallel data flow analysis for OpenMP programs[G] //LNCS 4935:Proc of the 3rd Int Workshop on OpenMP.Berlin:Springer,2008:138-142.
  • 8Hoeflinger J P,de Supinski B R.The OpenMP memory model[G] //LNCS 4315:Proc of the 1st Int Workshop on OpenMP.Berlin:Springer,2006:167-177.
  • 9OpenMP Architecture Review Board.OpenMP 3.0 specification[EB/OL].(2008-12-21)[2009-10-30].http://www.openmp.org.
  • 10IBM Corperation.SDK for Cell[EB/OL].(2007-01-15)[2009-10-30].http://www-128.ibm.com/developerworks/power/Cell/.

共引文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部