Amdahl定律在层次化片上多核处理器中的扩展被引量：7

Revisiting Amdahl's Law in the Hierarchical Chip Multicore Processors

下载PDF

导出

摘要层次化片上多核处理器以紧耦合的多个核构成超节点,对访存和片上通信的局部性有良好支撑,能有效地缓解片上多核中数据通信带来的通信开销.在关于多核处理器的Amdahl开销/性能模型已有的研究基础上,引入片上数据通信延迟作为Amdahl任务计算开销的新元素,构建了层次化片上多核处理器的Amdahl加速比扩展模型.基于该扩展模型,就层次化片上多核处理器的加速比与超节点配置的关系问题展开研究.模拟分析发现,要获得良好的加速比性能,层次化片上多核处理器需要在超节点数目与超节点的大小(超节点内核的个数)之间作仔细的权衡;对于给定核数目的层次化片上多核处理器,使系统性能最优的超节点大小往往出现在中间某个值而不是最大或者最小,并且该值随着系统规模的变化会发生相应的变化. Hierarchical chip multicore processors（HCMPs） can well support the memory reference and on-chip communications locality through supernodes,each of which consists of several tightly coupled processing cores,and thus efficiently reduce the data communications latency.This paper revisits the previous cost/performance Amdahl model of the multicore processors,and make some extentions to account for the non uniform data communications latency of the HCMP architectures.Through those extentions,this paper investigates the relationship between the performance speedup and the size of the supernodes,which means the number of cores in a supernode in hierarchical chip multicore processors,and some important design rules are maintained.Simulation results reveal that to maintain a better Amdahl speedup,the HCMP architecture designers should carefully deal with the size of the supernode and the number of supernodes in an HCMP.Given the overall number of processing cores in an HCMP,the configuration of the supernode that makes the HCMP the optimal performance is with the intermediate number of middle-sized supernodes,and the optimal size of the supernode also varies with the overall cores in the HCMP.During the design of a specific hierarchical chip multicore processor,the proposed performance model can be utilized to help the designers make a better decision.

作者陈书明陈胜刚尹亚明

机构地区国防科学技术大学计算机学院

出处《计算机研究与发展》 EI CSCD 北大核心 2012年第1期83-92,共10页 Journal of Computer Research and Development

基金国家"八六三"高技术研究发展计划基金项目(2007AA01Z108 2009AA011704) 教育部"高性能微处理器设计技术"创新团队计划基金项目(IRT0614) "核高基"国家科技重大专项基金项目(2009ZX01034-001-001-006) 国家自然科学基金项目(60676010)

关键词层次化结构片上多核处理器数据通信性能模型 AMDAHL定律 hierarchical architecture chip multicore processor data communications performance model Amdahl＇s law

分类号 TP302 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献20

1Agarwal A. Tiled multicore processors: The four stages of reality [keynote][C] //Proc of the 40th Annual IEEE/ACM Int Symp on Microarchitecture (MICRO'07). Los Alamitos, CA: IEEE Computer Society, 2008.
2Sankaralingam K, Nagarajan R, Gratz P, et al. The distributed microarehitecture of the TRIPS prototype processor [C] //Proc of the 39th Int Symp on Microarch tecture (MICRO06). Los Alamitos, CA:IEEE Computer Society, 2006:480-491.
3Taylor M B, Lee W. Miller J, et al. Evaluation of the raw microprocessor: An exposed-wire-delay architecture for ILP and streams [C]//Proc of Int Symp on Computer Arehitecture(ISCA04). New York: ACM, 2004:2-13.
4Butts M. Synchronization through communication in a massively parallel processor array [J]. IEEE Micro, 2007, 27(5) : 32-40.
5Das R, Eachempati S, Mishra A K, et al. Design and evaluation of a hierarchical on chip interconnect for next generation CMPs [C] //Proe of IEEE 15th Int Symp on High Performance Computer Architecture (HPCA'09). Los Alamitos, CA: IEEE Computer Society, 2009: 175-186.
6Balfour J, Dally W J. Design tradeoffs for tiled CMP on chip networks [C] //Proc of the 20th Annual Int Conf on Supercomputing(ICS'06). New York: ACM, 2006:187-198.
7Bourduas S, Zilic Z. A hybrid ring/mesh interconnect of network-on-chip using hierarchical rings for global routing [C] //Proc of the 1st Int Symp on Networks-on Chips (NOCS'07). Los Alamitos, CA= IEEE Computer Society, 2007, 195-204.
8Horowitz M, Dally W. How scaling will change processor architecture [C] //Proc of IEEE Int Solid-State Circuits Conf (ISSCC'04). Piscataway, NJ: IEEE, 2004:132-133.
9Lu Z. Design and analysis of on-chip communication for network-on-chip platforms [D]. Stockholm: Department of Electronic Systems, Royal Institute of Technology (KTH), 2007.
10Raghunathan V, Srivastava M B, Gupta R K. A survey of techniques for energy efficient on-chip communication [C] // Proc of the 40th Annual Design Automation Conf (DAC'03). New York: ACM, 2003 : 900-905.

二级参考文献15

1Partha Pratim Pande, Michael Joncs, Andre Ivanov, et al. Performance evaluation and design trade-offs for network-on- chip interconnect architectures [J]. IEEE Trans on Computers, 2005, 54(8): 1025-1040.
2John Kim, James Balfour, William J Dally. Flattened butterfly topology for on-chip networks [C]//Proc of the 40th IEEE/ACM Int Symp on Microarchitecture (MICRO- 40). Los Alamitos, CA: IEEE Computer Society, 2007: 172-182.
3Strauss Karin, Shen Xiaowei, Torrellas Josep. Uncorq: Unconstrained snoop request delivery in embedded-ring muhiproeessors [C]//Proc of the 40th IEEE/ACM Int Symp on Microarchitecture (MICRO-40). Los Alamitos, CA: IEEE Computer Society, 2007 : 327-339.
4Chang M Frank, Cong Jason, Kaplan Adam, et al. CMP network on-chip overlaid with multi-band RF-intereonnect [C]//Proc of the 14th Int Symp on High-Performance Computer Architecture (HPCA'08). Los Alamitos, CA: IEEE Computer Society, 2008:191-202.
5Paul. Gratz, Boris Grot, Stephen W Keckler. Regional congestion awareness for load balance in networks-on-chip [C]//Proc of the 14th Int Symp on High-Performance Computer Architecture ( HPCA'08 ). Los Alamitos, CA: IEEE Computer Society, 2008:203-214.
6Reetuparna Das, Asit K Mishra, Chrysostomos Nicopoulos, et al. Performance and power optimization through data compression in network-on-chip architectures [C] //Proc of the 14th Int Symp High-Performance Computer Architccture (HPCA'08). Los Alamitos, CA: IEEE Computer Society. 2008 : 215-225.
7John Kim, William J Dally, Steve Scott, et al. Technology driven, highly-scalable dragonfly topology[C]//Proc of the 35th Int Symp on Computer Architecture (ISCA'08), Los Alamitos, CA: IEEE Computer Society, 2008:77-88.
8Lee Jae W, Man Cheuk Ng, Asanovic Krste. Globally synchronized frames for guaranteed quality-of-service in on- chip networks [C] //Proc of the 35th Int Symp on Computer Architecture ( 1SCA'08 ). Los Alamitos, CA:IEEE Computer Society, 2008:89-100.
9Martha Merealdi Kim, John D Davis, Mark Oskin, et al. Polymorphic on-chip networks [C]//Proc of the 35th Int Symp on Computer Architecture (ISCA'08). Los Alamitos, CA: IEEE Computer Society, 2008:101-112.
10Dana Vantrease, Robert Schrciber, Matteo Monchiero, et al. Corona: system implications of emcrging nanophotonic technology [C] //Proc of the 35th Int Symp on Computer Architecture ( ISCA'08 ). Los Alamitos, CA: IEEE Computer Society, 2008:153-164.

共引文献7

1宋立国,胡承秀,王亮.众核处理器研究技术综述和分析[J].计算机科学,2022,49(S02):900-906.
2王炜,乔林,杨广文,汤志忠.扩展二维网格片上互连性能分析[J].清华大学学报（自然科学版）,2010,50(1):161-164. 被引量：5
3王炜,乔林,杨广文,汤志忠.分级环片上网络互连[J].计算机学报,2010,33(2):326-334. 被引量：5
4王炜,乔林,杨广文,汤志忠.二维片上网络局部均匀随机通信性能分析[J].计算机研究与发展,2010,47(3):532-540. 被引量：2
5彭元喜,朱红雷,陈海燕.一种基于二步流控方法的片上动态虚通道路由器[J].计算机研究与发展,2011,48(1):36-44. 被引量：1
6王炜,乔林,汤志忠.片上网络互连拓扑综述[J].计算机科学,2011,38(10):1-5. 被引量：6
7王炜,乔林,汤志忠,李清宝.HHSR:一种命令与数据分传片上网络原型[J].计算机科学,2012,39(4):299-303. 被引量：1

同被引文献77

1迟利华,刘杰,胡庆丰.数值并行计算可扩展性评价与测试[J].计算机研究与发展,2005,42(6):1073-1078. 被引量：10
2张晓丰,樊启华,程红斌.密码算法研究[J].计算机技术与发展,2006,16(2):179-180. 被引量：20
3贾川,隋兵才.X流处理器的研究与实现[J].计算机与现代化,2007(2):123-126. 被引量：2
4李卓,邱慧娟.基于相关系数的快速图像匹配研究[J].北京理工大学学报,2007,27(11):998-1000. 被引量：34
5张云泉,张先轶,龙国平,等. OpenCL异构计算[M]. 北京:清华大学出版社, 2012:1-29.
6Aaftab Munshi, Benedict R Gaster, Timothy G Mattson, et al. OpenCL Programming Guide[M]. Addison-Wesley,2011.
7Advanced Micro Devices Inc. AMD Accelerated Parallel Processing OpenCL Programming Guide[DB/OL].http://developer.amd.com/wordpress/media/2013/07/AMD_Accelerated_Parallel_OpenCl_Programming_Guide_rev_2.7.pdf,2013-11-01.
8AMD上海研发中心. 跨平台的多核与众核编程讲义[DB/OL]. http://down.51cto.com/data/964762, 2014-05-12.
9迈克老狼. AMD OpenCL大学教程中文版[EB/OL].http://www.cnblogs.com/mikewolf2002/ARCHIVE/2012/01/03/2332356.html, 2012-01-30.
10pengx17. AMD的显卡架构与OpenCL性能之间的一点思考[EB/OL]. http://pengx17.me/opencl/2013/09/25/amd-architect/, 2013-09-25.

引证文献7

1王兴,苗春生,王秀君,樊仲欣.基于OpenCL的雷达外推算法改进与优化[J].计算机与现代化,2014(8):81-86. 被引量：1
2侯宁,赵红梅,宋宇鲲.层次化片上多核处理器性能研究[J].合肥工业大学学报（自然科学版）,2014,37(10):1226-1230.
3熊焕亮,曾国荪,吴沧海.一种等性能面积的并行计算可扩展性度量方法[J].计算机研究与发展,2014,51(11):2547-2558. 被引量：1
4徐金甫,陈帆,冯晓,李伟.密码多核处理器互联结构研究与设计[J].电子技术应用,2015,41(9):51-54. 被引量：1
5冯晓,戴紫彬,李伟,蔡路亭.基于Amdahl定律的多核密码处理器性能模型研究[J].电子与信息学报,2016,38(4):827-833. 被引量：4
6冯晓,戴紫彬,蔡路亭,李伟.基于Amdahl定律扩展的多核处理器性能模型研究[J].电子学报,2017,45(6):1424-1430. 被引量：2
7李功丽,戴紫彬,徐进辉,王寿成,朱玉飞,冯晓.基于流体系架构的分组密码处理器设计[J].计算机研究与发展,2017,54(12):2824-2833. 被引量：2

二级引证文献11

1王兴,张琳焓,王丽娟,张潇潇.基于Kriging的加密自动气象站要素场插值与改进[J].软件工程师,2015(11):6-10. 被引量：3
2叶苗.多核处理器下SKLOIS多级安全数据库查询方法研究[J].科学技术与工程,2017,17(2):95-99.
3田青,祝永志.SMP集群系统的可扩放性分析[J].计算机技术与发展,2017,27(6):95-98.
4戴紫彬,易肃汶,李伟,南龙梅.椭圆曲线密码处理器的高效并行处理架构研究与设计[J].电子与信息学报,2017,39(10):2487-2494. 被引量：4
5戴卓臣,陆江东.面向数据加密的多核多线程并行研究[J].电子设计工程,2018,26(8):183-187. 被引量：3
6戴乐育,杨天池,郭松,王家琰.可重构分组密码协处理器二维指令架构[J].计算机工程与设计,2018,39(4):918-922.
7严迎建,王寿成,徐进辉,李功丽.基于Amdahl定律的分组密码并行处理模型研究[J].北京理工大学学报,2018,38(9):977-984. 被引量：3
8徐锋,冷梦甜.基于格式化分组指令流的CPU卡远程控制方法探索[J].物联网技术,2019,9(1):69-72. 被引量：1
9温玉华.基于DTW算法的英语发音错误自动校正系统设计[J].现代电子技术,2020,43(10):124-126. 被引量：8
10曲海成,于思淼,刘万军,王鑫源.面向CUDA程序的性能预测框架[J].电子学报,2020,48(4):654-661.

1张建文,陈军,王强.并行程序加速比计算和分析[J].科技广场,2006(1):15-16. 被引量：2
2刘俊,刘志祥.应用Amdahl定律对多核处理器性能的分析[J].信息技术,2010,34(6):232-234.
3冯晓,戴紫彬,李伟,蔡路亭.基于Amdahl定律的多核密码处理器性能模型研究[J].电子与信息学报,2016,38(4):827-833. 被引量：4
4詹文岛.Amdahl定律和Barsis公式[J].中国计算机用户,1991(5):18-19.
5胡悦,童维勤.单程序多数据并行程序优化规律分析[J].计算机应用,2014,34(A01):103-106. 被引量：1
6谭明锋,高蕾,龚正虎.IP路由查找算法研究概述[J].计算机工程与科学,2006,28(6):77-80. 被引量：14
7孔晓红.多处理机系统任务调度研究[J].河南科技学院学报,2007,35(2):46-49.
8史云凌.计算机结构计量原则的分析──Amdahl＇s法则的量化和表达公式[J].小型微型计算机系统,1995,16(11):16-18.
9七大应用之三:大机商用系统的主将:大型计算机[J].金融电子化,2004(9):45-45.
10周游,王庆林,邱德慧.性能评价方法在ADRC参数整定中的应用[J].北京理工大学学报,2011,31(10):1189-1193. 被引量：2

计算机研究与发展

2012年第1期

浏览历史

内容加载中请稍等...

Amdahl定律在层次化片上多核处理器中的扩展被引量：7

参考文献20

二级参考文献15

共引文献7

同被引文献77

引证文献7

二级引证文献11

相关作者

相关机构

相关主题

浏览历史

Amdahl定律在层次化片上多核处理器中的扩展 被引量：7

参考文献20

二级参考文献15

共引文献7

同被引文献77

引证文献7

二级引证文献11

相关作者

相关机构

相关主题

浏览历史

Amdahl定律在层次化片上多核处理器中的扩展被引量：7