期刊文献+

Providing Source Code Level Portability Between CPU and GPU with MapCG

Providing Source Code Level Portability Between CPU and GPU with MapCG
原文传递
导出
摘要 Graphics processing units (GPU) have taken an important role in the general purpose computing market in recent years. At present, the common approach to programming GPU units is to write CPU specific code with low level GPU APIs such as CUDA. Although this approach can achieve good performance, it creates serious portability issues as programmers are required to write a specific version of the code for each potential target architecture. This results in high development and maintenance costs. We believe it is desirable to have a programming model which provides source code portability between CPUs and GPUs, as well as different GPUs. This would allow programmers to write one version of the code, which can be compiled and executed on either CPUs or GPUs efficiently without modification. In this paper, we propose MapCG, a MapReduce framework to provide source code level portability between CPUs and GPUs. In contrast to other approaches such as OpenCL, our framework, based on MapReduce, provides a high level programming model and makes programming much easier. We describe the design of MapCG, including the MapReduce-style high-level programming framework and the runtime system on the CPU and GPU. A prototype of the MapCG runtime, supporting multi-core CPUs and NVIDIA GPUs, was implemented. Our experimental results show that this implementation can execute the same source code efficiently on multi-core CPU platforms and GPUs, achieving an average speedup of 1.6-2.5x over previous implementations of MapReduce on eight commonly used applications. Graphics processing units (GPU) have taken an important role in the general purpose computing market in recent years. At present, the common approach to programming GPU units is to write CPU specific code with low level GPU APIs such as CUDA. Although this approach can achieve good performance, it creates serious portability issues as programmers are required to write a specific version of the code for each potential target architecture. This results in high development and maintenance costs. We believe it is desirable to have a programming model which provides source code portability between CPUs and GPUs, as well as different GPUs. This would allow programmers to write one version of the code, which can be compiled and executed on either CPUs or GPUs efficiently without modification. In this paper, we propose MapCG, a MapReduce framework to provide source code level portability between CPUs and GPUs. In contrast to other approaches such as OpenCL, our framework, based on MapReduce, provides a high level programming model and makes programming much easier. We describe the design of MapCG, including the MapReduce-style high-level programming framework and the runtime system on the CPU and GPU. A prototype of the MapCG runtime, supporting multi-core CPUs and NVIDIA GPUs, was implemented. Our experimental results show that this implementation can execute the same source code efficiently on multi-core CPU platforms and GPUs, achieving an average speedup of 1.6-2.5x over previous implementations of MapReduce on eight commonly used applications.
作者 Chun-Tao Hong De-Hao Chen Yu-Bei Chen Wen-Guang Chen Wei-Min Zheng Hai-Bo Lin 洪春涛;陈德颢;陈羽北;陈文光;郑纬民;林海波(Department of Computer Science and Technology,Tsinghua University,Beijing 100084,China;Department of Electronic Engineering,Tsinghua University,Beijing 100084,China;IBM China Research Lab,Beijing 100094,China)
出处 《Journal of Computer Science & Technology》 SCIE EI CSCD 2012年第1期42-56,共15页 计算机科学技术学报(英文版)
基金 supported by the National Natural Science Foundation of China under Grant No. 60973143 the National High Technology Research and Development 863 Program of China under Grant No. 2008AA01A201 the National Basic Research 973 Program of China under Grant No. 2007CB310900
关键词 PORTABILITY PARALLEL GPU programming portability, parallel, GPU programming
  • 相关文献

参考文献30

  • 1NVIDIA. NVIDIA CUDA compute unified device architecture programming guide, http://developer.dounload.nvidia.com/ compute/cuda/1-1/NVIDIA_CUDA_programming_Guide_l.1. pdf, 2007.
  • 2Eichenberger A E, O'Brien J K, O'Brien K Met al. Using advanced compiler technology to exploit the performance of the Cell Broadband EngineTM architecture. IBM Systems Journal, 2006, 45(1): 59-84.
  • 3Zhu W R, Sreedhar V C, Hu Z, Gao G R. Synchronization state buffer: Supporting efficient fine-grain synchronization on many-core architectures. In Proc. the 34th ISCA, June 2007, pp.35-45.
  • 4Buck I, Foley T, Horn D et al. Brook for GPUs: Stream computing on graphics hardware. ACM Trans. Graph., 2004, 23(3): 777-786.
  • 5Khronos Group. OpenCL specification, http://www.khronos. org/registry/cl/.
  • 6Stratton J, Stone S S, Hwu W M. MCUDA: An efficient im- plementation of CUDA kernels for multi-core CPUs. In Proc. the 21th LCPC, Julv 31-Aug. 2, 2008, DO.16-30.
  • 7He B S, Fang W B, Luo Q, Govindaraju N K, Wang T. Mars: A mapreduce framework on graphics processors. In Proc. the 17th PACT, Oct. 2008, pp.260-269.
  • 8Ranger C, Raghuraman R, Penmetsa A, Bradski G, Kozyrakis C. Evaluating mapreduce for multi-core and multiprocessor systems. In Proc. the 13th HPCA, Feb. 2007, pp.13-24.
  • 9Berger E D, McKinley K S, Blumofe R D, Wilson P R. Hoard: A scalable memory allocator for multithreaded applications. SIGPLAN Not., 2000, 35(11): 117-128.
  • 10Dean J, Ghemawat S. MapReduce: Simplified data process- ing on large clusters. In Proc. the 6th OSDI, Dec. 2004, pp.137-150.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部