面向异构众核的CUDA程序二进制翻译

Binary translation of CUDA program for heterogeneous and many-core architecture

下载PDF

导出

摘要通过二进制翻译手段将CUDA程序移植到其他异构众核处理器平台特别是国产处理器平台,对扩展CUDA程序应用范围,发挥目标平台的众核优势以及支持民族产业都具有现实意义。设计了CUDA程序的二进制翻译框架,从CUDA程序可执行代码入手,采用"分而治之"的手段,将主机端代码和设备端代码分别翻译。重点介绍了移植过程中几个关键问题的解决,包括设备端代码的提取,计算模型的映射,存储模型的映射,栅栏同步和指令翻译问题。通过实验验证了系统功能的正确性。 Porting CUDA program to other heterogeneous and many-core architectures by the way of binary translation is significant for extending the range of the CUDA application, leveraging the advantage of many cores and supporting national industry. Begin with the CUDA executive code, a translation framework is designed in which the host-end codes and device-end codes are translated separately based on a method of Divide-and-Conquer. In the course of migration, the solutions of several key problems are described emphatically such as the extraction of device-end codes, the mapping of computing model and storage model, synchronization barrier and instruction translation. The experimental results validate the system＇s correctness.

作者李男庞建民单征

机构地区解放军信息工程大学

出处《计算机工程与应用》 CSCD 北大核心 2016年第7期17-23,共7页 Computer Engineering and Applications

基金国家高技术研究发展计划(863)(No.2009AA012201) 国家核高基重大专项(No.2009ZX01036-001-001 No.0412-7)

关键词 CUDA程序二进制翻译计算模型映射存储模型映射栅栏同步指令翻译 CUDA program binary translation computing model mapping storage model mapping synchronization barrier instruction translation

分类号 TP311.52 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献13

1Buck I.GPU computing with NVIDIA CUDA[C]//Proceedings of ACM SIGGRAPH 2007 Courses,San Diego,California,USA,2007:6-8.
2Stratton J,Stone S,Hwu W.MCUDA:An effective implementation of CUDA kernels for multi-core CPUs[C]//Proceedings of the 21st International Workshop on Languages and Compilers for Parallel Computing,Edmonton,Canada,2008:16-30.
3Martinez G,Gardner M,Feng W.CU2CL:A CUDA-toOpen CL translator for multi-and many-core architectures[C]//Proceedings of IEEE 17th Intl Conference on Parallel and Distributed Systems,Tainan,Taiwan,China,2011:300-307.
4Guo Z,Zhang E,Shen X.Correctly treating synchronizations in compiling fine-grained spmd-threaded programs for CPU[C]//Proceedings of the 20th International Conference on Parallel Architectures and Compilation Techniques,Galveston Island,Texas,USA,2011:310-319.
5Stratton J,Grover V,Marathe J,et al.Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUS[C]//Proceedings of the 2010 International Symposium on Code Generation and Optimization,Toronto,Canada,2010:111-119.
6Diamos G,Kerr A,Kesavan M.Ocelot:A dynamic optimization framework for Bulk-Synchronous applications in heterogeneous system[C]//Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques,Vienna,Austria,2010:353-364.
7Diamos G,Kerr A,Kesavan M.Translating GPU binaries to tiered simd architectures with ocelot,GIT-CERCS-09-01[R].Georgia Institute of Technology,2009.
8Dominguez R,Schaa D,Kaeli D.Caracal:Dynamic translation of runtime environments for GPUs[C]//Proceedings of 4th Workshop on General Purpose Processing on Graphics Processing Units,Newport Beach,CA,USA,2011:51-57.
9Lattner C,Adve V.LLVM:A compilation framework for lifelong program analysis&transformation[C]//Proceedings of the International Symposium on Code Generation and Optimization:Feedback-Directed and Runtime Optimization,Washington,DC,USA,2004:75-80.
10Kerr A,Diamos G,Yalamanchili S.A characterization and analysis of PTX kernels[C]//Proceedings of 2009 IEEE International Symposium on Workload Characterization,Austin,Texas,USA,2009:3-12.

1邓培智.CUDA编程模型[J].程序员,2008(5):84-85. 被引量：3
2杨云生,张朝晖.基于计算统一设备架构的程序优化研究[J].信息技术,2011(12):51-54.
3周威,姚建华.初探通讯带宽和延迟对CUDA程序的影响[J].高性能计算技术,2010,0(5):55-59.
4高学强.浅议信息安全管理建设[J].中小企业管理与科技,2010(36):38-38. 被引量：5
5激打市场迎来民族产业的春天[J].信息系统工程,2001(12):16-16.
6杜中华,王兴贵,陈永才.科学计算时计算机编程语言的互译问题研究[J].计算机工程,2001,27(12):164-165. 被引量：1
7孙立超,张盛兵,程训焘,张萌.基于CUDA的快速人脸检测算法[J].计算机与现代化,2013(8):11-14. 被引量：1
8李小强,安虹,吴石磊.一个面向存储优化GPGPU编译器[J].小型微型计算机系统,2014,35(4):872-877. 被引量：1
9华锋亮.GPU上不同存储器上CUDA程序功耗分析[J].信息与电脑,2016,28(3):61-62.
10定制自己的Palm设备——Hack与DA[J].个人电脑,2001,7(4):144-145.

计算机工程与应用

2016年第7期

浏览历史

内容加载中请稍等...

面向异构众核的CUDA程序二进制翻译

参考文献13

相关作者

相关机构

相关主题

浏览历史