期刊文献+

Amdahl定律在层次化片上多核处理器中的扩展 被引量:7

Revisiting Amdahl's Law in the Hierarchical Chip Multicore Processors
下载PDF
导出
摘要 层次化片上多核处理器以紧耦合的多个核构成超节点,对访存和片上通信的局部性有良好支撑,能有效地缓解片上多核中数据通信带来的通信开销.在关于多核处理器的Amdahl开销/性能模型已有的研究基础上,引入片上数据通信延迟作为Amdahl任务计算开销的新元素,构建了层次化片上多核处理器的Amdahl加速比扩展模型.基于该扩展模型,就层次化片上多核处理器的加速比与超节点配置的关系问题展开研究.模拟分析发现,要获得良好的加速比性能,层次化片上多核处理器需要在超节点数目与超节点的大小(超节点内核的个数)之间作仔细的权衡;对于给定核数目的层次化片上多核处理器,使系统性能最优的超节点大小往往出现在中间某个值而不是最大或者最小,并且该值随着系统规模的变化会发生相应的变化. Hierarchical chip multicore processors(HCMPs) can well support the memory reference and on-chip communications locality through supernodes,each of which consists of several tightly coupled processing cores,and thus efficiently reduce the data communications latency.This paper revisits the previous cost/performance Amdahl model of the multicore processors,and make some extentions to account for the non uniform data communications latency of the HCMP architectures.Through those extentions,this paper investigates the relationship between the performance speedup and the size of the supernodes,which means the number of cores in a supernode in hierarchical chip multicore processors,and some important design rules are maintained.Simulation results reveal that to maintain a better Amdahl speedup,the HCMP architecture designers should carefully deal with the size of the supernode and the number of supernodes in an HCMP.Given the overall number of processing cores in an HCMP,the configuration of the supernode that makes the HCMP the optimal performance is with the intermediate number of middle-sized supernodes,and the optimal size of the supernode also varies with the overall cores in the HCMP.During the design of a specific hierarchical chip multicore processor,the proposed performance model can be utilized to help the designers make a better decision.
出处 《计算机研究与发展》 EI CSCD 北大核心 2012年第1期83-92,共10页 Journal of Computer Research and Development
基金 国家"八六三"高技术研究发展计划基金项目(2007AA01Z108 2009AA011704) 教育部"高性能微处理器设计技术"创新团队计划基金项目(IRT0614) "核高基"国家科技重大专项基金项目(2009ZX01034-001-001-006) 国家自然科学基金项目(60676010)
关键词 层次化结构 片上多核处理器 数据通信 性能模型 AMDAHL定律 hierarchical architecture chip multicore processor data communications performance model Amdahl's law
  • 相关文献

参考文献20

  • 1Agarwal A. Tiled multicore processors: The four stages of reality [keynote][C] //Proc of the 40th Annual IEEE/ACM Int Symp on Microarchitecture (MICRO'07). Los Alamitos, CA: IEEE Computer Society, 2008.
  • 2Sankaralingam K, Nagarajan R, Gratz P, et al. The distributed microarehitecture of the TRIPS prototype processor [C] //Proc of the 39th Int Symp on Microarch tecture (MICRO06). Los Alamitos, CA:IEEE Computer Society, 2006:480-491.
  • 3Taylor M B, Lee W. Miller J, et al. Evaluation of the raw microprocessor: An exposed-wire-delay architecture for ILP and streams [C]//Proc of Int Symp on Computer Arehitecture(ISCA04). New York: ACM, 2004:2-13.
  • 4Butts M. Synchronization through communication in a massively parallel processor array [J]. IEEE Micro, 2007, 27(5) : 32-40.
  • 5Das R, Eachempati S, Mishra A K, et al. Design and evaluation of a hierarchical on chip interconnect for next generation CMPs [C] //Proe of IEEE 15th Int Symp on High Performance Computer Architecture (HPCA'09). Los Alamitos, CA: IEEE Computer Society, 2009: 175-186.
  • 6Balfour J, Dally W J. Design tradeoffs for tiled CMP on chip networks [C] //Proc of the 20th Annual Int Conf on Supercomputing(ICS'06). New York: ACM, 2006:187-198.
  • 7Bourduas S, Zilic Z. A hybrid ring/mesh interconnect of network-on-chip using hierarchical rings for global routing [C] //Proc of the 1st Int Symp on Networks-on Chips (NOCS'07). Los Alamitos, CA= IEEE Computer Society, 2007, 195-204.
  • 8Horowitz M, Dally W. How scaling will change processor architecture [C] //Proc of IEEE Int Solid-State Circuits Conf (ISSCC'04). Piscataway, NJ: IEEE, 2004:132-133.
  • 9Lu Z. Design and analysis of on-chip communication for network-on-chip platforms [D]. Stockholm: Department of Electronic Systems, Royal Institute of Technology (KTH), 2007.
  • 10Raghunathan V, Srivastava M B, Gupta R K. A survey of techniques for energy efficient on-chip communication [C] // Proc of the 40th Annual Design Automation Conf (DAC'03). New York: ACM, 2003 : 900-905.

二级参考文献15

  • 1Partha Pratim Pande, Michael Joncs, Andre Ivanov, et al. Performance evaluation and design trade-offs for network-on- chip interconnect architectures [J]. IEEE Trans on Computers, 2005, 54(8): 1025-1040.
  • 2John Kim, James Balfour, William J Dally. Flattened butterfly topology for on-chip networks [C]//Proc of the 40th IEEE/ACM Int Symp on Microarchitecture (MICRO- 40). Los Alamitos, CA: IEEE Computer Society, 2007: 172-182.
  • 3Strauss Karin, Shen Xiaowei, Torrellas Josep. Uncorq: Unconstrained snoop request delivery in embedded-ring muhiproeessors [C]//Proc of the 40th IEEE/ACM Int Symp on Microarchitecture (MICRO-40). Los Alamitos, CA: IEEE Computer Society, 2007 : 327-339.
  • 4Chang M Frank, Cong Jason, Kaplan Adam, et al. CMP network on-chip overlaid with multi-band RF-intereonnect [C]//Proc of the 14th Int Symp on High-Performance Computer Architecture (HPCA'08). Los Alamitos, CA: IEEE Computer Society, 2008:191-202.
  • 5Paul. Gratz, Boris Grot, Stephen W Keckler. Regional congestion awareness for load balance in networks-on-chip [C]//Proc of the 14th Int Symp on High-Performance Computer Architecture ( HPCA'08 ). Los Alamitos, CA: IEEE Computer Society, 2008:203-214.
  • 6Reetuparna Das, Asit K Mishra, Chrysostomos Nicopoulos, et al. Performance and power optimization through data compression in network-on-chip architectures [C] //Proc of the 14th Int Symp High-Performance Computer Architccture (HPCA'08). Los Alamitos, CA: IEEE Computer Society. 2008 : 215-225.
  • 7John Kim, William J Dally, Steve Scott, et al. Technology driven, highly-scalable dragonfly topology[C]//Proc of the 35th Int Symp on Computer Architecture (ISCA'08), Los Alamitos, CA: IEEE Computer Society, 2008:77-88.
  • 8Lee Jae W, Man Cheuk Ng, Asanovic Krste. Globally synchronized frames for guaranteed quality-of-service in on- chip networks [C] //Proc of the 35th Int Symp on Computer Architecture ( 1SCA'08 ). Los Alamitos, CA:IEEE Computer Society, 2008:89-100.
  • 9Martha Merealdi Kim, John D Davis, Mark Oskin, et al. Polymorphic on-chip networks [C]//Proc of the 35th Int Symp on Computer Architecture (ISCA'08). Los Alamitos, CA: IEEE Computer Society, 2008:101-112.
  • 10Dana Vantrease, Robert Schrciber, Matteo Monchiero, et al. Corona: system implications of emcrging nanophotonic technology [C] //Proc of the 35th Int Symp on Computer Architecture ( ISCA'08 ). Los Alamitos, CA: IEEE Computer Society, 2008:153-164.

共引文献7

同被引文献77

  • 1迟利华,刘杰,胡庆丰.数值并行计算可扩展性评价与测试[J].计算机研究与发展,2005,42(6):1073-1078. 被引量:10
  • 2张晓丰,樊启华,程红斌.密码算法研究[J].计算机技术与发展,2006,16(2):179-180. 被引量:20
  • 3贾川,隋兵才.X流处理器的研究与实现[J].计算机与现代化,2007(2):123-126. 被引量:2
  • 4李卓,邱慧娟.基于相关系数的快速图像匹配研究[J].北京理工大学学报,2007,27(11):998-1000. 被引量:34
  • 5张云泉,张先轶,龙国平,等. OpenCL异构计算[M]. 北京:清华大学出版社, 2012:1-29.
  • 6Aaftab Munshi, Benedict R Gaster, Timothy G Mattson, et al. OpenCL Programming Guide[M]. Addison-Wesley,2011.
  • 7Advanced Micro Devices Inc. AMD Accelerated Parallel Processing OpenCL Programming Guide[DB/OL].http://developer.amd.com/wordpress/media/2013/07/AMD_Accelerated_Parallel_OpenCl_Programming_Guide_rev_2.7.pdf,2013-11-01.
  • 8AMD上海研发中心. 跨平台的多核与众核编程讲义[DB/OL]. http://down.51cto.com/data/964762, 2014-05-12.
  • 9迈克老狼. AMD OpenCL大学教程中文版[EB/OL].http://www.cnblogs.com/mikewolf2002/ARCHIVE/2012/01/03/2332356.html, 2012-01-30.
  • 10pengx17. AMD的显卡架构与OpenCL性能之间的一点思考[EB/OL]. http://pengx17.me/opencl/2013/09/25/amd-architect/, 2013-09-25.

引证文献7

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部