面向CPU-GPU架构的源到源自动映射方法被引量：2

Novel automatic mapping technology on CPU-GPU heterogeneous systems

下载PDF

导出

摘要针对GPU上应用开发移植困难的问题,提出了一种串行计算源程序到并行计算源程序的映射方法。该方法从串行源程序中获得可并行化循环的层次信息,建立循环体结构与GPU线程的对应关系,生成GPU端核心函数代码;根据变量引用读写属性生成CPU端控制代码。基于该方法实现了一个编译原型系统,完成了C语言源程序到CUDA源程序的自动生成。对原型系统在功能和性能方面的测试结果表明,该系统生成的CUDA源程序与C语言源程序在功能上一致,其性能有显著提高,在一定程度上解决了计算密集型应用向CPU-GPU异构多核系统移植困难的问题。 Aiming at the developing and porting difficulties of GPU-based applications, a mapping approach is proposed, which converts serial computing source code into equivalent parallel computing source code. This approach acquires hier-archies of parallelizable loops from serial sources, establishes the correspondence between loop structures and GPU threads, and generates the core function code for GPU. Meanwhile, CPU control code is generated according to read/write attributes of variable references. A compiler prototype is implemented based on this approach, which translates C code into CUDA code automatically. Functionality and performance evaluations of the prototype show that the CUDA code generated is functionally equivalent to the original C code, with significant improvement in performance, thus overcomes the diffi-culty in porting compute-intensive applications to CPU-GPU heterogeneous systems.

作者朱正东刘袁魏洪昌颜康王寅峰董小社

机构地区西安交通大学电子与信息工程学院深圳信息职业技术学院

出处《计算机工程与应用》 CSCD 北大核心 2015年第21期41-47,共7页 Computer Engineering and Applications

基金国家自然科学基金(No.61173039) 青年基金项目(No.61202041) 国家高技术研究发展计划(863)(No.2012AA010904 No.2012AA01A306) 深圳市科技计划(No.JCYJ20120615101127404)

关键词通用计算图形处理器(GPGPU) 统一计算架构(CUDA) 自动映射源到源编译 General Purpose Graphic Processing Unit （GPGPU） Compute Unified Device Architecture （CUDA） auto-matic mapping source to source compile

分类号 TP303 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献16

1吴恩华.图形处理器用于通用计算的技术、现状及其挑战[J].软件学报,2004,15(10):1493-1504. 被引量：141
2Baskaran M,Ramanujam J,Sadayappan P.Automatic C-toCUDA code generation for affine programs[C]//International Conference on Compiler Construction(ETAPS CC2010),2010:244-263.
3Lee S,Min S J,Eigenmann R.Open MP to GPGPU:a compiler framework for automatic translation and optimization[C]//ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming(PPOPP2009),2009:101-110.
4Zhu Q,Shen L,Gan X B,et al.A compiler framework for translating standard C into optimized CUDA code[C]//International Conference on Human-Centric Computing and Embedded and Multimedia Computing,2011:502-514.
5Wolfe M.Implementing the PGI accelerator model[C]//Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units,NY,USA,2010:43-50.
6Yang Y,Zhou H Y.Developing a high performance GPGPU compiler using cetus[EB/OL].[2014-06-03].http://people.engr.ncsu.edu/hzhou/cetus_wksp.pdf.
7Rudy G.CUDA-CHi LL:a programming language interface for GPGPU optimizations and code generation[D].The University of Utah,USA,2010.
8Han T D,Abdelrahman T S.Hi CUDA:a high-level directive-based language for GPU programming[C]//2nd Workshop on General Purpose Processing on Graphics Processing Units,Washington DC,USA,2009:52-61.
9Li D,Cao H J,Dong X S,et al.GPU-S2S:a compiler for source-to-source translation on GPU[C]//Proceedings of the 3rd International Symposium on Parallel Architectures,Algorithms and Programming(PAAP2010),Dalian,China,2010:144-148.
10Grosser T,Zheng H A R,Simbürger A,et al.Polly-polyhedral optimization in LLVM[C]//First International Workshop on Polyhedral Compilation Techniques(IMPACT'11),Chamonix,France,2011.

二级参考文献26

1吴恩华,柳有权.基于图形处理器(GPU)的通用计算[J].计算机辅助设计与图形学学报,2004,16(5):601-612. 被引量：227
2吴恩华.图形处理器用于通用计算的技术、现状及其挑战[J].软件学报,2004,15(10):1493-1504. 被引量：141
3Amarasinghe S P,Lam M S.Communieation optimization and code generation for distributed memory maehines[C]//Proceedings of The ACM SIGPLAN'93 Conference on Programming Language Desigen and Implementation(PLDI),Albuquerque,New Mexieo,1993:126-138.
4Atkinson D C,Griswold W GJmplementation techniques for efficient data-flow analysis of large programs[C]//Anneliese A A,Aniello C.Proceedings of the IEEE Intemational Conference on Software Maintenance.Italy:University of Sannio Press,2001:52-61.
5Kandemir M,Banerjee P,Choudhary A,et al.A global communication optimization technique based on data-flow analysis and linear algebra[C]//ACM Transactions on Programming Languages and Systems(TOPLAS),1999,21(6):1251-1297.
6Anderson J M,Lam M S.Global optimizations for parallelism and locality on scalable parallel machines[C]//Proceedings of the ACM SIGPLAN'93 Conference on Programming Language Design and Implementation,June 1993:112-125.
7Amarasinghe S P,Anderson J M,Lam M S,et al.An overview of the SUIF compiler for scalable parallel machines[C]//Proceedings of the Seventh SIAM Conference on Parallel Processing for Scientific Computing,1995.
8YANG Yi, XIANG Ping, KONG Jingfei, et al. A GPGPU compiler for memory optimization and parallelism management[C]//Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation. New York, USA: ACM, 2010: 86-97.
9MALONY A D, BIERSDORFF S, MAYANGLAMBAM S. An experimental approach to performance measurement of heterogeneous parallel applications using CUDA[C]//Proeeedings of the 24th ACM International Conference on Supercomputing. New York, USA: ACM, 2010; 127-136.
10BAGHSORKHI S S, DELAHAYE M, PATEL S J, et al. An adaptive performance modeling tool for GPU architectures[C]//Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. New York, USA: ACM, 2010. 105 -114.

共引文献149

1刘波,王博亮,谢杰镇.应用于生物膜组织的虚拟手术仿真技术研究[J].中国数字医学,2007,2(11):37-40. 被引量：1
2张军,易成,王邦平,李晓峰.GPU加速的鲁棒性人脸2.5D重建方法[J].四川大学学报（工程科学版）,2009,41(4):155-162.
3刘伟峰,赵改善,孔祥宁,蔡杰雄,张兵.基于多GPU的三维Kirchhoff积分法体偏移[J].华中科技大学学报（自然科学版）,2011,39(S1):110-114.
4刘伟峰,王永胜,张天雷,张兵.使用GPU模拟地震波传播的性能研究[J].系统仿真学报,2009,21(S1):170-174. 被引量：3
5鲍春波,王博亮.基于半边结构的膜组织触觉仿真[J].学术问题研究,2006,0(2):104-109.
6张建勋,刘全利,陈庄.基于可编程GPU的快速体绘制技术[J].重庆大学学报（自然科学版）,2005,28(7):67-70. 被引量：9
7柳有权,刘学慧,吴恩华.基于GPU带有复杂边界的三维实时流体模拟[J].软件学报,2006,17(3):568-576. 被引量：54
8方建文,于金辉,马文龙.图形硬件加速的实时水面绘制[J].计算机工程与应用,2006,42(15):86-88. 被引量：2
9李笑盈,吴恩华.过程性纹理映射的FPGA动态生成[J].计算机辅助设计与图形学学报,2006,18(5):630-637. 被引量：1
10李建明,万单领,迟忠先,胡祥培.一种基于GPU加速的细粒度并行粒子群算法[J].哈尔滨工业大学学报,2006,38(12):2162-2166. 被引量：8

同被引文献14

1Dichev K,Lastovetsky A,Jeannot E,et al.6.Optimization of Collective Communication for Heterogeneous HPC Platforms[J].High-Performance Computing on Complex Environments,2014(11):12-15.
2Blattner T,Yang S.Performance study on CUDA GPUs for parallelizing the local ensemble transformed Kalman filter algorithm[J].Concurrency&Computation Practice&Experience,2012,24(2):167-177.
3Tao G,Yutong L,Guang S.Using MIC to Accelerate a Typical Data-Intensive Application:The Breadth-first Search[C]//20131EEE International Symposium on Parallel&Distributed Processing,Workshops and Phd Forum IEEE Computer Society,2013:1117-1125.
4Potluri S,Bureddy D,Hamidouche K,et al.MVAPICH-PRISM:A proxy-based communication framework using Infini Band and SCIF for Intel MIC clusters[C]//2013 SC-International Conference for High Performance Computing,Networking,Storage and Analysis IEEE Computer Society,2013:1-11.
5陈伟,张玉芳,熊忠阳.动态反馈的异构集群负载均衡算法的实现[J].重庆大学学报（自然科学版）,2010,33(2):73-78. 被引量：11
6张保,曹海军,董小社,李丹,胡雷钧.面向图形处理器重叠通信与计算的数据划分方法[J].西安交通大学学报,2011,45(4):1-5. 被引量：5
7张保,董小社,白秀秀,曹海军,刘超,梅一多.CPU-GPU系统中基于剖分的全局性能优化方法[J].西安交通大学学报,2012,46(2):17-23. 被引量：10
8王超,陈香兰,周学海,王爱立.异构多核平台上基于任务划分和调度的性能评估方法[J].中国科学院研究生院学报,2012,29(2):257-263. 被引量：3
9蔡镇河,张旭,栾江霞.CPU+GPU异构模式下并行计算效率研究[J].计算机与现代化,2012(5):185-188. 被引量：5
10李静梅,金胜男.基于异构多核处理器的静态任务调度研究[J].计算机工程与设计,2013,34(1):178-184. 被引量：7

引证文献2

1李薛剑,陈豪,朱凯.基于DTPS算法的异构集群优化策略[J].实验室研究与探索,2016,35(3):126-129.
2魏洪昌,朱正东,董小社,宁洁.面向CPU-GPU源到源编译系统的渐近拟合优化方法[J].计算机工程与应用,2016,52(21):30-35. 被引量：1

二级引证文献1

1周清军,刘红侠.TP RAM的低功耗优化设计及应用[J].计算机工程与应用,2017,53(16):237-240.

1蒋佩钊,邓冲,王阳萍.基于GPGPU的快速医学图像FCM分割算法[J].兰州交通大学学报,2014,33(3):87-92. 被引量：2
2董蕾,黄方,卜栓栓,冯杰,周纪.基于CUDA的压缩感知重构算法并行化研究[J].信息技术,2016,40(4):32-36. 被引量：1
3何锡君,陈华础,陈晓东.一种新型红外信号处理平台的构建及应用[J].光电技术应用,2010,25(5):15-19. 被引量：2
4陈湘骥,韩国强,张芝源.基于GPU加速的实时视频超分辨率重建[J].计算机应用,2013,33(12):3540-3543. 被引量：1
5殷铭宏,朱建勇.组态王与SQL数据库的数据传输[J].有色冶金设计与研究,2008,29(5):40-41. 被引量：4
6陈彬,陈和平,李晓卉.基于GPU的高效图像协方差矩阵算法与实现[J].计算机工程与设计,2014,35(12):4238-4242. 被引量：2
7李永华,张林,赵玉霞.一种改进的非嵌入式水印算法[J].电脑知识与技术,2012,8(12):8254-8255.
8翟俊海,王熙照,张素芳.信息粒度、信息熵与决策树[J].计算机工程与应用,2009,45(12):126-128. 被引量：5
9王志国.在．NET环境下用Treeview遍历活动目录[J].电脑编程技巧与维护,2008(2):13-15. 被引量：2
10黄玉龙,邹循进,刘奎,苏本跃.GPU加速的分段Top-k查询算法[J].计算机应用,2014,34(11):3112-3116. 被引量：1

计算机工程与应用

2015年第21期

浏览历史

内容加载中请稍等...

面向CPU-GPU架构的源到源自动映射方法被引量：2

参考文献16

二级参考文献26

共引文献149

同被引文献14

引证文献2

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

面向CPU-GPU架构的源到源自动映射方法 被引量：2

参考文献16

二级参考文献26

共引文献149

同被引文献14

引证文献2

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

面向CPU-GPU架构的源到源自动映射方法被引量：2