期刊文献+

面向GPU存储优化的程序重构方法 被引量:4

GPU Memory Optimization Through Program Restructuring Methods
下载PDF
导出
摘要 图形处理器(GPU)的高性价比吸引了越来越多的科学计算.和图形应用相比,科学计算程序存在纷杂的数据依赖和不规则访问,影响其在GPU上的执行性能.为此,提出一种面向GPU体系结构的程序重构方法.通过计算重构增大程序的可并行性和计算密集性,改善GPU上计算资源的利用率.通过数据重构消除程序中的不规则数据访问,使用向量数据类型提高程序的存储带宽.实验结果表明:文中提出的优化方法减少了程序在GPU上的执行时间,获得了1.17~8.91倍的加速比. Graphic processing units attract more and more scientific computing due to their high performance/cost ratio.Compared to the graphical applications,there are complex data dependences and irregular data access patterns in scientific computing programs.Therefore,we propose GPU-oriented program restructuring methods.By computation restructuring,we enhance the parallelism and the compute intensity of the programs,improving the ALU resources utilization on GPU.By data restructuring,we eliminate the irregular data access patterns,using vector data type to improve the available memory bandwidth of the programs.Experimental results show that the proposed optimization methods can decrease the execution time of the scientific computing programs on GPU and achieve 1.17~8.91 times speedup.
出处 《小型微型计算机系统》 CSCD 北大核心 2011年第10期1921-1927,共7页 Journal of Chinese Computer Systems
基金 上海市重点学科建设基金项目(B114)资助 AMD大学合作计划基金资助
关键词 GPU 科学计算程序 计算重构 数据重构 存储优化 向量数据类型 GPU scientific computing programs computation restructuring data restructuring memory optimization vector data type
  • 相关文献

参考文献1

二级参考文献57

  • 1Clark James H.The geometry engine:A VLSI geometry system for graphics[A].In:Computer Graphics Proceedings,Annual Conference Series,ACM SIGGRAPH,Boston,1982.127~133
  • 2Fuchs Herry,Poulton John.Pixel-planes:A VLSI-Oriented design for a raster graphics engine[J].VLSI Design,1981,2(3):20~28
  • 3Eyles John,Austin John,Fuchs Henry,et al.Pixel-plane 4:A summary,advances in computer graphics hardware II[A].Eurographic Seminars Tutorials and Perspectives in Computer Graphics,New York:Springer-Verlag,1988.183~208
  • 4Fuchs Herry,Israel Laura,Poulton John,et al.Pixel-planes 5:A heterogeneous multiprocessor graphics system using processor-enhanced memories[A].In:Computer Graphics Proceedings,Annual Conference Series,ACM SIGGRAPH,Boston,1989.79~88
  • 5http://www.nvidia.com/object/gpu.html[OL]
  • 6http://developer.nvidia.com/[OL]
  • 7http://www.ati.com/developer/[OL]
  • 8http://www.gpgpu.org[OL]
  • 9Joo Luiz Dihl Comba,Dietrich Carlos A,Pagot Christian A,et al.Computation on GPUs:From a programmable pipeline to an efficient stream processor[J].Revista de Informática Teóricae Aplicada,2003,X(2):41~70
  • 10Krüger Jens,Westermann Rüdiger.Linear algebra operators for GPU implementation of numerical algorithms[J].ACM Transactions on Graphics,2003,22(3):908~916

共引文献226

同被引文献20

  • 1周海芳,赵进.基于GPU的遥感图像配准并行程序设计与存储优化[J].计算机研究与发展,2012,49(S1):281-286. 被引量:18
  • 2吴恩华,柳有权.基于图形处理器(GPU)的通用计算[J].计算机辅助设计与图形学学报,2004,16(5):601-612. 被引量:227
  • 3Hung Che-Lun, Lin Yaw-Ling, Li Kuan-Ching, et al. Ef-ficient GPGPU-based parallel packet classification [ C ]// 2011 IEEE 10th International Conference on Trust, Securi- ty and Privacy in Computing and Communications. 2011 : 1367-1374.
  • 4Alastair Nottingham, Barry Irwin. GPU packet classifica- tion using OpenCL: A consideration of viable classification methods[ C ]// Proceedings of the 2009 Annual Research Conference of the South African Institute of Computer Sci- entists and Information Technologists. 2009:160-169.
  • 5Alastair Nottingham, Barry Irwin. Parallel packet classifi- cation using GPU co-processors [ C ].// Proceedings of the 2010 Annual Research Conference of the South African In- stitute of Computer Scientists and Information Technolo- gists. 2010:231-241.
  • 6Sangjin Han, Keon Jang, KyongSoo Park, et al. Packet- Shader : A GPU-accelerated software router[ C ]//Proceed- ing of the ACM SIGCOMM 2010 Conference. 2010: 195- 206.
  • 7Kang Kang, Yangdong Steve Deng. Scalable packet classi- fication via GPU metaprogramming[ C ]//Design, Automa- tion & Test in Europe Conference & Exhibition. 2011:1-4.
  • 8Shane Ryoo, Christopher I Rodrigues, Sam S Stone, et al. Program optimization space pruning for a multithreaded GPU[ C]//Proceedings of the 6th Annual IEEE/ACM In- ternational Symposium on Code Generation and Optimiza- tion. 2008 : 195-204.
  • 9刘胤,杨世平.基于RFC算法的快速多维数据包分类算法[J].计算机工程,2008,34(6):95-97. 被引量:8
  • 10左颢睿,张启衡,徐勇,赵汝进.基于GPU的并行优化技术[J].计算机应用研究,2009,26(11):4115-4118. 被引量:23

引证文献4

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部