基于ESCA系统的层次化显式访存机制研究被引量：2

Research on Hierarchical Explicit Memory Access Mechanism Based on ESCA System

下载PDF

导出

摘要针对高性能混合计算系统中的存储墙问题,在分析其计算模式特点及传统访存机制局限性的基础上,提出适用于混合计算系统的层次化显式存储访问机制,并基于ESCA多核处理器系统进行实现和评测。实验结果显示,针对核心应用程序DGEMM,延迟隐藏能够占据整体运行时间的56%,并获得1.5倍的加速比,能弥补计算与存储访问间的速度差异,提高系统计算效率。 To address the memory wall issue of the high performance hybrid computing systems,this paper proposes a novel hierarchical explicit memory access mechanism based on the analysis of hybrid computing mode and the limitations of the traditional memory access mechanism.The proposed mechanism is implemented and evaluated on a multi-core hybrid computing system Engineering and Scientific Computing Architecture（ESCA）.Experimental results show that the hidden of memory access latency can occupy 56% of the total run time and achieve 1.5 times speedup with the kernel of DGEMM,which proves that the proposed memory access mechanism is beneficial to fill the gap between computing and memory,thus improving the system efficiency.

作者饶金理吴丹陈攀董冕邓承诺戴葵邹雪城

机构地区华中科技大学电子科学与技术系

出处《计算机工程》 CAS CSCD 北大核心 2011年第22期24-27,34,共5页 Computer Engineering

基金国家自然科学基金资助项目(NSFC60973035 NSFC60976027) 湖北省自然科学基金资助项目(2010CBD02705)

关键词混合计算存储墙多核处理器 ESCA系统层次化显示存储访问延迟隐藏 hybrid computing memory wall multi-core processor Engineering and Scientific Computing Architecture（ESCA） system hierarchical explicit memory access hidden of latency

分类号 TP302.1 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献9

1Barker K J,Davis K,Hoisie A,et al.Entering the Petaflop Era: The Architecture and Performance of Roadrunner[C]//Proc.of the ACM/IEEE Conference on Supercomputing.Piscataway,USA: IEEE Press,2008.
2Wulf W A,McKee S A.Hitting the Memory Wall: Implications of the Obvious[J].Computer Architecture News,1995,23(1): 20-24.
3Margolus N.An Embedded DRAM Architecture for Large-scale Spatial-lattice Computations[C]//Proc.of the 27th Annual International Symposium on Computer Architecture.New York,USA: ACM Press,2000: 149-160.
4张英杨学军唐玉华等.PIM一种能有效缓解存储墙问题的技术.计算机研究与发展,2004,41:347-351.
5Wu Dan,Dai Kui,Zou Xuecheng,et al.A High Efficient On-chip Interconnection Network in SIMD CMPs[C]//Proc.of the 10th International Conference on Algorithms and Architecture for Parallel Processing.Busan,Korea:[s.n.],2010: 149-162.
6Chen Pan,Dai Kui,Wu Dan,et al.The Parallel Algorithm Implementation of Matrix Multiplication Based on ESCA[C]// Proc.of the IEEE Asia Pacific Conference on Circuits and Systems [S.l.]: IEEE Press,2010.
7黄安文,高军,张民选.多核处理器片上存储系统研究[J].计算机工程,2010,36(4):4-6. 被引量：5
8Rixner S.Stream Processor Architecture[M].Norwell,USA: Kluwer Academic Publishers,2001.
9Lee Hyuk-Jae,Robertson J P.Generalized Cannon’s Algorithm for Parallel Matrix Multiplication[C]//Proc.of the 11th International Conference on Supercomputing.New York,USA:[s.n.],1997: 44-51.

二级参考文献7

1Hammond L. The Standford Hydra[J]. IEEE Micro, 2000, 20(2): 71-84.
2Sun Microsystems, Inc.. OpenSPARC T2 Core Microarchitecture Specification[Z]. 2007.
3Nguyen T P Q, Zakhor A, Yelick K. Performance Analysis of an H.263 Video Encoder for VIRAM[C]//Proc. of IEEE International Conference on Image Processing. [S. l.]: IEEE Press, 2000: 98-101.
4Seiler L, Carinean D, Sprangle E, et al. Larrabee: A Many-core x86 Architecture for Visual Computing[J]. ACM Transactions on Graphics, 2008, 27(3): 18-26.
5Muralimanohar N, Balasubramonian R, Jouppi NE Architecting Efficient Interconnects for Large Caches with CACTI 6.0[J]. IEEE Micro, 2008, 28(1): 69-79.
6Martin M M K, Sorin D J, Beckmann B M, et al. Multifacet's General Execution-driven Multiprocessor Simulator(GEMS) Toolset[J]. ACM SEGARCH Computer Architecture News, 2005, 33(4): 92-99.
7何军,王飙.多核处理器的结构设计研究[J].计算机工程,2007,33(16):208-210. 被引量：24

共引文献4

1邢慧敏,谢憬,毛志刚.一种多核系统中的二维块数据存储机制[J].计算机工程,2011,37(10):252-254. 被引量：1
2倪亚路,周晓方.一种新型共享Cache动态划分机制[J].计算机工程,2011,37(22):231-233.
3汪玲,黄炎,袁光辉.重用感知的非一致缓存迁移策略研究[J].计算机工程,2014,40(2):81-85. 被引量：1
4蒋林,崔朋飞,山蕊,武鑫,田汝佳.视频阵列处理器多层次分布式存储结构设计[J].计算机工程与应用,2018,54(12):57-62. 被引量：4

同被引文献26

1Barker K J, Davis K, Hoisie A, et al. Entering the petaflop era: the architeclure and performance of Roadrunner[C] // SC' 08 Proceedings of the 2008 ACM/IEEE Conference on Supercomputing. Piscat- away: IEEE, 2008: 23-33.
2Ken K. Roadrunner and hybrid compuling[DB/OL]. [2012 02-20]. http: //www. lanl. gov/orgs/hpc/sa- lishan2007/koch, pdf.
3Feng Wuchun. TopS00 official website[EB/OL]. [2012-02 20]. http://www, green500, org/,.
4Kumar R. Tullsen D M. Heterogeneous chip multi- processors[J]. IEEE Computer Science. 2005,38(11) : ?)2-38.
5Kempf T, Doerper M. I.eupers R, et al. A modular simulation fram work for spatial and temporal task mapping onto muhi-processor SoC platform [C] //Conference oll Design, Automation and Test in Eu- rope. Washington.. IEEE, 2005.. 876-881.
6Ruggiero M, Guerri A, Bertozzi D, et al. Conamuni- cation-aware allocation and scheduling framework for stream-oriented multi-processor system-on chip[C]// Conference on Design, Automation and Test in Eu- rope. Mulch: IEEE, 2006:3-8.
7Ma Z, Catthoor F, Votmckx J. Hierarchical task scheduler for interleaving suhtasks on heterogeneous multiprocessor platforms[C] // 2005 Asia and Soulh Pacific Design Automation Conference. Shanghai: IEEE, 2005:952-955.
8Baruah S. Task partitioning upon heterogeneous mul- tiprocessor platform[C]// 10th IEEE Real-time and Embedded Technology and Applications Symposium. I.os Alamitos: IEEE. 200,1: 536-543.
9Kuang S R, Chen C Y, I.iao R Z. Partitioning and pipelined scheduling of embedded system using integer linear programming[C] ff llth International Con- ference on Parallel and Distributed Systems. Wash- ington: IEEE, 2005.. 37-41.
10Rzadca K, Seredynski F. Heterogeneous multipro- cessor scheduling with differential evolutio [C] // 2005 IEEE Congress on Evolutionary Computation. Edinburgh.. IEEE, 2005: 2840-2847.

引证文献2

1郑朝霞,宋丹丹,戴葵,吴丹.异构多核协处理器ESCA及其JPEG压缩算法[J].华中科技大学学报（自然科学版）,2012,40(9):39-43. 被引量：1
2贺章擎,黄威,戴葵,郑朝霞.图像Laplace变换在异构多核工程科学计算加速协处理器上的实现[J].计算机应用,2014,34(2):369-372. 被引量：1

二级引证文献2

1李强,王玫,刘争红.基于RFID覆盖扫描的标签定位方法[J].计算机工程,2017,34(3):294-298. 被引量：7
2张敏华,张剑贤,裘雪红,周端.基于OpenCL的JPEG压缩算法并行化设计与实现[J].计算机工程与科学,2017,39(5):855-860. 被引量：1

1路杨,李涵.ACCESS中计算查询设计方法概述[J].电脑知识与技术,2012,8(1X):515-516.
2董冕,吴丹,饶金理,黄威,戴葵,邹雪城.高性能子字并行运算单元的设计与实现[J].计算机工程,2012,38(16):249-252. 被引量：2
3冯圣中,谭光明,徐琳,孙凝晖,徐志伟.曙光4000H生物信息处理专用计算机的高性能算法研究[J].计算机研究与发展,2005,42(6):1053-1058. 被引量：3
4李礼,文梅,伍楠,李海燕,张春元.流处理器延迟隐藏机制的优化及实现[J].计算机工程与科学,2007,29(3):74-76.
5闫鹤,李小勇,胡鹏,刘海涛.分布式文件系统的流式数据预读[J].计算机研究与发展,2012,49(S1):252-256. 被引量：1
6丁鑫,陈榕,陈海波.分布式图计算框架混合计算模式的研究[J].小型微型计算机系统,2015,36(4):665-670. 被引量：1
7刘钢锋.基于CPU/GPU集群的编程的研究[J].微电子学与计算机,2013,30(2):128-131. 被引量：2
8邓承诺,吴丹,黄威,戴葵,邹雪城.高效能ESCA协处理器验证技术研究[J].计算机工程与科学,2014,36(1):28-33.
9易丹,沈安文.神经网络式电力负荷预测的混合计算[J].长沙水电师院学报（自然科学版）,2001,16(2):47-49. 被引量：1
10鹿中龙,钟诚,黄华林.多核计算机上非递归并行计算矩阵乘积[J].小型微型计算机系统,2011,32(5):860-866. 被引量：5

计算机工程

2011年第22期

浏览历史

内容加载中请稍等...

基于ESCA系统的层次化显式访存机制研究被引量：2

参考文献9

二级参考文献7

共引文献4

同被引文献26

引证文献2

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

基于ESCA系统的层次化显式访存机制研究 被引量：2

参考文献9

二级参考文献7

共引文献4

同被引文献26

引证文献2

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

基于ESCA系统的层次化显式访存机制研究被引量：2