期刊文献+

面向通用HPC的高性能DSP设计权衡 被引量:4

Design Tradeoffs of High Performance DSPs for General-Purpose HPC
下载PDF
导出
摘要 GPU由于其计算能力高达数TFLOPS,被高性能计算领域用于加速并行运算.但GPU较低的峰值性能利用率和功耗效率,已经成为了系统性能进一步提升的瓶颈.为了解决这个问题,作者开始研究将高性能DSP用于通用高性能计算领域.为了高效支撑通用高性能计算,文中提出了高性能DSP的结构框架,并通过映射GotoBLAS库到该结构上,建立了GEMM在该结构上的性能模型.作者研究了影响GEMM效率的主要因素,包括性能、存储层次、核的大小以及核的数量.文中总结了一些有指导意义的结论用于构建面向通用高性能计算的高效DSP.实验结果表明,通过尽可能少的硬件代价,可以在TFLOPS DSP上获得接近峰值的性能. The traditional HPC area employs GPUs which can afford TFLOPS level computing ability to accelerate the parallel computing.The low peak performance utilization and the low power efficiency of GPUs have become the bottlenecks for the system performance improvement.We start introducing high performance DSPs into general-purpose HPC area to address this issue.To support general-purpose HPC effectively,this paper constructs a performance model for the GEMM on high performance DSPs by mapping GotoBLAS onto the proposed architecture.We investigate factors that influence the performance and efficiency of GEMM,including performance,memory hierarchy,core size and number of cores.Some suggestive conclusions are summarized to help designing DSPs that are efficient for the general-purpose HPC.Evaluation results show that it can achieve a near-peak performance on the TFLOPS DSP with as few hardware cost as possible.
出处 《计算机学报》 EI CSCD 北大核心 2013年第4期790-798,共9页 Chinese Journal of Computers
基金 国家自然科学基金(60906014 61070036) 国防科学技术大学高性能计算联合博导组科研基金 教育部博士点基金(20094307110009)资助
关键词 高性能计算 矩阵乘法 数字信号处理器 模型 设计权衡 HPC GEMM DSP model design tradeoffs
  • 相关文献

参考文献23

  • 1Esmaeilzadeh Hadi, Blem Emily, Amant Renee St,Sankaralingam Karthikeyan, Burger Doug. Dark silicon andthe end of multicore scaling//Proceedings of the ACM/IEEEInternational Symposium on Computer Architecture. SanJose, USA, 2011:365-376.
  • 2Texas Instruments Incorporated. TMS320C66x multicoreDSPs for high-performance computing. USA:TI SPRT619,2011.
  • 3Igual Francisco D,Ali Murtaza,Friedmann Arnon, StotzerEric, Wentz Timothy, van de Geijn Robert. UnleashingDSPs for general-purpose HPC. USA:TI FLAME WorkingNote# 61, Feb.,2012.
  • 4Woh Mark,Seo Sangwon,Mahlke Scott, Mudge Trevor,Chakrabarti Chaitali, Flautner Krisztian. AnySP:Anytimeanywhere anyway signal processing//Proceedings of theACM/IEEE International Symposium on Computer Architec-ture. Austin, USA, 2009:128-139.
  • 5Kagstrom B, Ling P, Van Loan C. GEMM-based Level3 BLAS:High performance model implementations and per-formance evaluation benchmark. ACM Transactions onMathematical Software, 1998,24(3):268-302.
  • 6Volkov V,Demmel J. Benchmarking GPUs to tune denselinear algebra//Proceedings of the ACM/IEEE Supercomput-ing. Austin, USA, 2008:1-11.
  • 7Lin Colin Y,So Hayden K-H,Leong Philip H W. A modelfor peak matrix performance on FPGAs//Proceedings of theIEEE International Symposium on Field- Programmable Cus-tom Computing Machines. Salt Lake City, USA, 2011:251.
  • 8Nath R et al. An improved MAGMA GEMM for fermiGPUs. USA; NVIDIA Technical Report:LAPACK WN #227, 2010.
  • 9Tan Guangming, Li Linchuan, Triechler Sean, PhillipsEverett, BaoYungang,Sun Ninghui. Fast implementation ofDGEMM on fermi GPU//Proceedings of the ACM/IEEE Su-percomputing. Seatle,USA,2011:1-11.
  • 10Li Jiajia, Li Xingjian, Tan Guangming, Chen Mingyu,SunNinghui. An optimized large-scale hybrid DGEMM designfor CPUs and ATI GPUs//Proceedings of the ACM/IEEESupercomputing. Salt Lake City, USA, 2012:377-386.

同被引文献45

  • 1李波,葛宝珊,李炜,姚春莲.基于通用DSP的多模式视频编码器[J].计算机学报,2004,27(12):1648-1656. 被引量:3
  • 2马骥,乔双,李丹.基于Web的嵌入式DSP测控系统设计[J].东北师大学报(自然科学版),2007,39(1):41-45. 被引量:3
  • 3Hadjipaschalis I,Poullikkas A,Efthimiou V.Overview of current and future energy storage technologies for electric power applications[J].Renewable and Sustainable Energy Reviews,2009,13(6):1513-1522.
  • 4Li K,Kumpf R,Horton P,et al.A quantitative analysis of disk drive power management in portable computers[C]∥USENIX winter.2002:279-291.
  • 5Gurumurthi S,Sivasubramaniam A,Kandemir M,et al.DRPM:dynamic speed control for power management in server class disks[C]∥30th Annual International Symposium on Computer Architecture,2003.IEEE,2003:169-179.
  • 6Son S W,Chen G,Kandemir M,et al.Exposing disk layout to compiler for reducing energy consumption of parallel disk based systems[C]∥Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming.ACM,2005:174-185.
  • 7Son S W,Kandemir M,Choudhary A.Software-directed diskpower management for scientific applications[C]∥19th IEEE International Parallel and Distributed Processing Symposium,2005.IEEE,2005:4-13.
  • 8Weissel A,Beutel B,Bellosa F.Cooperative I/O:A novel I/O semantics for energy-aware applications [J].ACM SIGOPS Opera-ting Systems Review,2002,36(SI):117-129.
  • 9Papathanasiou A E,Scott M L.Energy efficient prefetching and caching[C]∥Proceedings of the 2004 USENIX Annual Technical Conference.Berkeley,CA,USA,2004:255-268.
  • 10Pinheiro E,Bianchini R.Energy conservation techniques for disk array-based servers[C]∥Proceedings of the 18th annual international conference on Supercomputing.ACM,2004:68-78.

引证文献4

二级引证文献23

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部