期刊文献+

基于ESCA系统的层次化显式访存机制研究 被引量:2

Research on Hierarchical Explicit Memory Access Mechanism Based on ESCA System
下载PDF
导出
摘要 针对高性能混合计算系统中的存储墙问题,在分析其计算模式特点及传统访存机制局限性的基础上,提出适用于混合计算系统的层次化显式存储访问机制,并基于ESCA多核处理器系统进行实现和评测。实验结果显示,针对核心应用程序DGEMM,延迟隐藏能够占据整体运行时间的56%,并获得1.5倍的加速比,能弥补计算与存储访问间的速度差异,提高系统计算效率。 To address the memory wall issue of the high performance hybrid computing systems,this paper proposes a novel hierarchical explicit memory access mechanism based on the analysis of hybrid computing mode and the limitations of the traditional memory access mechanism.The proposed mechanism is implemented and evaluated on a multi-core hybrid computing system Engineering and Scientific Computing Architecture(ESCA).Experimental results show that the hidden of memory access latency can occupy 56% of the total run time and achieve 1.5 times speedup with the kernel of DGEMM,which proves that the proposed memory access mechanism is beneficial to fill the gap between computing and memory,thus improving the system efficiency.
出处 《计算机工程》 CAS CSCD 北大核心 2011年第22期24-27,34,共5页 Computer Engineering
基金 国家自然科学基金资助项目(NSFC60973035 NSFC60976027) 湖北省自然科学基金资助项目(2010CBD02705)
关键词 混合计算 存储墙 多核处理器 ESCA系统 层次化显示存储访问 延迟隐藏 hybrid computing memory wall multi-core processor Engineering and Scientific Computing Architecture(ESCA) system hierarchical explicit memory access hidden of latency
  • 相关文献

参考文献9

  • 1Barker K J,Davis K,Hoisie A,et al.Entering the Petaflop Era: The Architecture and Performance of Roadrunner[C]//Proc.of the ACM/IEEE Conference on Supercomputing.Piscataway,USA: IEEE Press,2008.
  • 2Wulf W A,McKee S A.Hitting the Memory Wall: Implications of the Obvious[J].Computer Architecture News,1995,23(1): 20-24.
  • 3Margolus N.An Embedded DRAM Architecture for Large-scale Spatial-lattice Computations[C]//Proc.of the 27th Annual International Symposium on Computer Architecture.New York,USA: ACM Press,2000: 149-160.
  • 4张英 杨学军 唐玉华 等.PIM一种能有效缓解存储墙问题的技术.计算机研究与发展,2004,41:347-351.
  • 5Wu Dan,Dai Kui,Zou Xuecheng,et al.A High Efficient On-chip Interconnection Network in SIMD CMPs[C]//Proc.of the 10th International Conference on Algorithms and Architecture for Parallel Processing.Busan,Korea:[s.n.],2010: 149-162.
  • 6Chen Pan,Dai Kui,Wu Dan,et al.The Parallel Algorithm Implementation of Matrix Multiplication Based on ESCA[C]// Proc.of the IEEE Asia Pacific Conference on Circuits and Systems [S.l.]: IEEE Press,2010.
  • 7黄安文,高军,张民选.多核处理器片上存储系统研究[J].计算机工程,2010,36(4):4-6. 被引量:5
  • 8Rixner S.Stream Processor Architecture[M].Norwell,USA: Kluwer Academic Publishers,2001.
  • 9Lee Hyuk-Jae,Robertson J P.Generalized Cannon’s Algorithm for Parallel Matrix Multiplication[C]//Proc.of the 11th International Conference on Supercomputing.New York,USA:[s.n.],1997: 44-51.

二级参考文献7

  • 1Hammond L. The Standford Hydra[J]. IEEE Micro, 2000, 20(2): 71-84.
  • 2Sun Microsystems, Inc.. OpenSPARC T2 Core Microarchitecture Specification[Z]. 2007.
  • 3Nguyen T P Q, Zakhor A, Yelick K. Performance Analysis of an H.263 Video Encoder for VIRAM[C]//Proc. of IEEE International Conference on Image Processing. [S. l.]: IEEE Press, 2000: 98-101.
  • 4Seiler L, Carinean D, Sprangle E, et al. Larrabee: A Many-core x86 Architecture for Visual Computing[J]. ACM Transactions on Graphics, 2008, 27(3): 18-26.
  • 5Muralimanohar N, Balasubramonian R, Jouppi NE Architecting Efficient Interconnects for Large Caches with CACTI 6.0[J]. IEEE Micro, 2008, 28(1): 69-79.
  • 6Martin M M K, Sorin D J, Beckmann B M, et al. Multifacet's General Execution-driven Multiprocessor Simulator(GEMS) Toolset[J]. ACM SEGARCH Computer Architecture News, 2005, 33(4): 92-99.
  • 7何军,王飙.多核处理器的结构设计研究[J].计算机工程,2007,33(16):208-210. 被引量:24

共引文献4

同被引文献26

  • 1Barker K J, Davis K, Hoisie A, et al. Entering the petaflop era: the architeclure and performance of Roadrunner[C] // SC' 08 Proceedings of the 2008 ACM/IEEE Conference on Supercomputing. Piscat- away: IEEE, 2008: 23-33.
  • 2Ken K. Roadrunner and hybrid compuling[DB/OL]. [2012 02-20]. http: //www. lanl. gov/orgs/hpc/sa- lishan2007/koch, pdf.
  • 3Feng Wuchun. TopS00 official website[EB/OL]. [2012-02 20]. http://www, green500, org/,.
  • 4Kumar R. Tullsen D M. Heterogeneous chip multi- processors[J]. IEEE Computer Science. 2005,38(11) : ?)2-38.
  • 5Kempf T, Doerper M. I.eupers R, et al. A modular simulation fram work for spatial and temporal task mapping onto muhi-processor SoC platform [C] //Conference oll Design, Automation and Test in Eu- rope. Washington.. IEEE, 2005.. 876-881.
  • 6Ruggiero M, Guerri A, Bertozzi D, et al. Conamuni- cation-aware allocation and scheduling framework for stream-oriented multi-processor system-on chip[C]// Conference on Design, Automation and Test in Eu- rope. Mulch: IEEE, 2006:3-8.
  • 7Ma Z, Catthoor F, Votmckx J. Hierarchical task scheduler for interleaving suhtasks on heterogeneous multiprocessor platforms[C] // 2005 Asia and Soulh Pacific Design Automation Conference. Shanghai: IEEE, 2005:952-955.
  • 8Baruah S. Task partitioning upon heterogeneous mul- tiprocessor platform[C]// 10th IEEE Real-time and Embedded Technology and Applications Symposium. I.os Alamitos: IEEE. 200,1: 536-543.
  • 9Kuang S R, Chen C Y, I.iao R Z. Partitioning and pipelined scheduling of embedded system using integer linear programming[C] ff llth International Con- ference on Parallel and Distributed Systems. Wash- ington: IEEE, 2005.. 37-41.
  • 10Rzadca K, Seredynski F. Heterogeneous multipro- cessor scheduling with differential evolutio [C] // 2005 IEEE Congress on Evolutionary Computation. Edinburgh.. IEEE, 2005: 2840-2847.

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部