期刊文献+

面向异构众核的CUDA程序二进制翻译

Binary translation of CUDA program for heterogeneous and many-core architecture
下载PDF
导出
摘要 通过二进制翻译手段将CUDA程序移植到其他异构众核处理器平台特别是国产处理器平台,对扩展CUDA程序应用范围,发挥目标平台的众核优势以及支持民族产业都具有现实意义。设计了CUDA程序的二进制翻译框架,从CUDA程序可执行代码入手,采用"分而治之"的手段,将主机端代码和设备端代码分别翻译。重点介绍了移植过程中几个关键问题的解决,包括设备端代码的提取,计算模型的映射,存储模型的映射,栅栏同步和指令翻译问题。通过实验验证了系统功能的正确性。 Porting CUDA program to other heterogeneous and many-core architectures by the way of binary translation is significant for extending the range of the CUDA application, leveraging the advantage of many cores and supporting national industry. Begin with the CUDA executive code, a translation framework is designed in which the host-end codes and device-end codes are translated separately based on a method of Divide-and-Conquer. In the course of migration, the solutions of several key problems are described emphatically such as the extraction of device-end codes, the mapping of computing model and storage model, synchronization barrier and instruction translation. The experimental results validate the system's correctness.
出处 《计算机工程与应用》 CSCD 北大核心 2016年第7期17-23,共7页 Computer Engineering and Applications
基金 国家高技术研究发展计划(863)(No.2009AA012201) 国家核高基重大专项(No.2009ZX01036-001-001 No.0412-7)
关键词 CUDA程序 二进制翻译 计算模型映射 存储模型映射 栅栏同步 指令翻译 CUDA program binary translation computing model mapping storage model mapping synchronization barrier instruction translation
  • 相关文献

参考文献13

  • 1Buck I.GPU computing with NVIDIA CUDA[C]//Proceedings of ACM SIGGRAPH 2007 Courses,San Diego,California,USA,2007:6-8.
  • 2Stratton J,Stone S,Hwu W.MCUDA:An effective implementation of CUDA kernels for multi-core CPUs[C]//Proceedings of the 21st International Workshop on Languages and Compilers for Parallel Computing,Edmonton,Canada,2008:16-30.
  • 3Martinez G,Gardner M,Feng W.CU2CL:A CUDA-toOpen CL translator for multi-and many-core architectures[C]//Proceedings of IEEE 17th Intl Conference on Parallel and Distributed Systems,Tainan,Taiwan,China,2011:300-307.
  • 4Guo Z,Zhang E,Shen X.Correctly treating synchronizations in compiling fine-grained spmd-threaded programs for CPU[C]//Proceedings of the 20th International Conference on Parallel Architectures and Compilation Techniques,Galveston Island,Texas,USA,2011:310-319.
  • 5Stratton J,Grover V,Marathe J,et al.Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUS[C]//Proceedings of the 2010 International Symposium on Code Generation and Optimization,Toronto,Canada,2010:111-119.
  • 6Diamos G,Kerr A,Kesavan M.Ocelot:A dynamic optimization framework for Bulk-Synchronous applications in heterogeneous system[C]//Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques,Vienna,Austria,2010:353-364.
  • 7Diamos G,Kerr A,Kesavan M.Translating GPU binaries to tiered simd architectures with ocelot,GIT-CERCS-09-01[R].Georgia Institute of Technology,2009.
  • 8Dominguez R,Schaa D,Kaeli D.Caracal:Dynamic translation of runtime environments for GPUs[C]//Proceedings of 4th Workshop on General Purpose Processing on Graphics Processing Units,Newport Beach,CA,USA,2011:51-57.
  • 9Lattner C,Adve V.LLVM:A compilation framework for lifelong program analysis&transformation[C]//Proceedings of the International Symposium on Code Generation and Optimization:Feedback-Directed and Runtime Optimization,Washington,DC,USA,2004:75-80.
  • 10Kerr A,Diamos G,Yalamanchili S.A characterization and analysis of PTX kernels[C]//Proceedings of 2009 IEEE International Symposium on Workload Characterization,Austin,Texas,USA,2009:3-12.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部