期刊文献+

面向CPU-GPU架构的源到源自动映射方法 被引量:2

Novel automatic mapping technology on CPU-GPU heterogeneous systems
下载PDF
导出
摘要 针对GPU上应用开发移植困难的问题,提出了一种串行计算源程序到并行计算源程序的映射方法。该方法从串行源程序中获得可并行化循环的层次信息,建立循环体结构与GPU线程的对应关系,生成GPU端核心函数代码;根据变量引用读写属性生成CPU端控制代码。基于该方法实现了一个编译原型系统,完成了C语言源程序到CUDA源程序的自动生成。对原型系统在功能和性能方面的测试结果表明,该系统生成的CUDA源程序与C语言源程序在功能上一致,其性能有显著提高,在一定程度上解决了计算密集型应用向CPU-GPU异构多核系统移植困难的问题。 Aiming at the developing and porting difficulties of GPU-based applications, a mapping approach is proposed, which converts serial computing source code into equivalent parallel computing source code. This approach acquires hier-archies of parallelizable loops from serial sources, establishes the correspondence between loop structures and GPU threads, and generates the core function code for GPU. Meanwhile, CPU control code is generated according to read/write attributes of variable references. A compiler prototype is implemented based on this approach, which translates C code into CUDA code automatically. Functionality and performance evaluations of the prototype show that the CUDA code generated is functionally equivalent to the original C code, with significant improvement in performance, thus overcomes the diffi-culty in porting compute-intensive applications to CPU-GPU heterogeneous systems.
出处 《计算机工程与应用》 CSCD 北大核心 2015年第21期41-47,共7页 Computer Engineering and Applications
基金 国家自然科学基金(No.61173039) 青年基金项目(No.61202041) 国家高技术研究发展计划(863)(No.2012AA010904 No.2012AA01A306) 深圳市科技计划(No.JCYJ20120615101127404)
关键词 通用计算图形处理器(GPGPU) 统一计算架构(CUDA) 自动映射 源到源编译 General Purpose Graphic Processing Unit (GPGPU) Compute Unified Device Architecture (CUDA) auto-matic mapping source to source compile
  • 相关文献

参考文献16

  • 1吴恩华.图形处理器用于通用计算的技术、现状及其挑战[J].软件学报,2004,15(10):1493-1504. 被引量:141
  • 2Baskaran M,Ramanujam J,Sadayappan P.Automatic C-toCUDA code generation for affine programs[C]//International Conference on Compiler Construction(ETAPS CC2010),2010:244-263.
  • 3Lee S,Min S J,Eigenmann R.Open MP to GPGPU:a compiler framework for automatic translation and optimization[C]//ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming(PPOPP2009),2009:101-110.
  • 4Zhu Q,Shen L,Gan X B,et al.A compiler framework for translating standard C into optimized CUDA code[C]//International Conference on Human-Centric Computing and Embedded and Multimedia Computing,2011:502-514.
  • 5Wolfe M.Implementing the PGI accelerator model[C]//Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units,NY,USA,2010:43-50.
  • 6Yang Y,Zhou H Y.Developing a high performance GPGPU compiler using cetus[EB/OL].[2014-06-03].http://people.engr.ncsu.edu/hzhou/cetus_wksp.pdf.
  • 7Rudy G.CUDA-CHi LL:a programming language interface for GPGPU optimizations and code generation[D].The University of Utah,USA,2010.
  • 8Han T D,Abdelrahman T S.Hi CUDA:a high-level directive-based language for GPU programming[C]//2nd Workshop on General Purpose Processing on Graphics Processing Units,Washington DC,USA,2009:52-61.
  • 9Li D,Cao H J,Dong X S,et al.GPU-S2S:a compiler for source-to-source translation on GPU[C]//Proceedings of the 3rd International Symposium on Parallel Architectures,Algorithms and Programming(PAAP2010),Dalian,China,2010:144-148.
  • 10Grosser T,Zheng H A R,Simbürger A,et al.Polly-polyhedral optimization in LLVM[C]//First International Workshop on Polyhedral Compilation Techniques(IMPACT'11),Chamonix,France,2011.

二级参考文献26

  • 1吴恩华,柳有权.基于图形处理器(GPU)的通用计算[J].计算机辅助设计与图形学学报,2004,16(5):601-612. 被引量:227
  • 2吴恩华.图形处理器用于通用计算的技术、现状及其挑战[J].软件学报,2004,15(10):1493-1504. 被引量:141
  • 3Amarasinghe S P,Lam M S.Communieation optimization and code generation for distributed memory maehines[C]//Proceedings of The ACM SIGPLAN'93 Conference on Programming Language Desigen and Implementation(PLDI),Albuquerque,New Mexieo,1993:126-138.
  • 4Atkinson D C,Griswold W GJmplementation techniques for efficient data-flow analysis of large programs[C]//Anneliese A A,Aniello C.Proceedings of the IEEE Intemational Conference on Software Maintenance.Italy:University of Sannio Press,2001:52-61.
  • 5Kandemir M,Banerjee P,Choudhary A,et al.A global communication optimization technique based on data-flow analysis and linear algebra[C]//ACM Transactions on Programming Languages and Systems(TOPLAS),1999,21(6):1251-1297.
  • 6Anderson J M,Lam M S.Global optimizations for parallelism and locality on scalable parallel machines[C]//Proceedings of the ACM SIGPLAN'93 Conference on Programming Language Design and Implementation,June 1993:112-125.
  • 7Amarasinghe S P,Anderson J M,Lam M S,et al.An overview of the SUIF compiler for scalable parallel machines[C]//Proceedings of the Seventh SIAM Conference on Parallel Processing for Scientific Computing,1995.
  • 8YANG Yi, XIANG Ping, KONG Jingfei, et al. A GPGPU compiler for memory optimization and parallelism management[C]//Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation. New York, USA: ACM, 2010: 86-97.
  • 9MALONY A D, BIERSDORFF S, MAYANGLAMBAM S. An experimental approach to performance measurement of heterogeneous parallel applications using CUDA[C]//Proeeedings of the 24th ACM International Conference on Supercomputing. New York, USA: ACM, 2010; 127-136.
  • 10BAGHSORKHI S S, DELAHAYE M, PATEL S J, et al. An adaptive performance modeling tool for GPU architectures[C]//Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. New York, USA: ACM, 2010. 105 -114.

共引文献149

同被引文献14

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部