期刊文献+

一种高性能北桥芯片的设计及性能分析 被引量:1

Design Implementation and Performance Analysis of a High Performance Northbridge
下载PDF
导出
摘要 计算机系统整体性能的提高不仅仅依赖于处理器计算能力的提升也需要高性能芯片组的有力支持.芯片组承担着CPU和外围设备通信的重任,而且目前大多数系统中采用把内存控制器集成在北桥中的方法,这更加突出了北桥在访存性能以至于在整个系统中的关键作用.以高性能为目标,龙芯2C处理器配套北桥芯片NB2005的设计和优化采用了很多新的方法和技术,其中包括根据程序行为进行动态Page管理的内存控制电路,一种与内存控制电路状态相结合的预取策略和具备高吞吐量低延迟的PCI通道设计等.性能测试和分析表明,搭配NB2005的龙芯2C系统访存带宽要比搭配Marvell GT64240北桥的系统提高40%以上,运行SPECCPU2000浮点和定点程序的性能分别提高了12.2%和2.5%,磁盘I/O的性能也提高了30%. To improve the performance of the entire computing system, not only the performance of CPU needs to be boosted, but also high performance chipsets are needed. Chipsets are responsible for data delivery between CPU and other devices, commonly with memory controllers embedded as crucial components, and this significance is highlighted as the memory access latency has become one of the most significant bottlenecks in nowadays computer systems. Discussed in this paper are the methods of designing and implementing a northbridge targeting at high performance. The architecture of NB2005--a northbridge for Godson-2 processor--and the optimization techniques applied on each module are described in detail. A novel dynamic page management strategy in DDR controller is proposed, which exploits the spatial locality characteristics of programs to reduce memory access latency. A new steam buffer mechanism is described, which at runtime jointly considers the memory access behavior and the status of memory controller. Also presented is a new buffer-swap mechanism implemented in PCI channel to improve the throughput of PCI bus. Experiments show that the Godson-2 system augmented with NB2005 outperforms that with Marvell GT64240 in all aspects tested. Specifically, NB2005 achieves above 40% memory bandwidth enhancement, yie also improves the lds speedups of 12.2 % and 2.5 % in SPEC INT2000 and SPEC FP2000 respectively and disk I/O performance by more than 30 %.
出处 《计算机研究与发展》 EI CSCD 北大核心 2007年第9期1501-1509,共9页 Journal of Computer Research and Development
基金 国家"九七三"重点基础研究发展规划基金项目(2005CB321600) 国家自然科学基金项目(60673146 60603049) 国家杰出青年科学基金项目(60325205) 国家"八六三"高技术研究发展计划基金项目(2006AA010201) 中国科学院计算技术研究所知识创新课题基金项目(20056240) 北京市自然科学基金项目(4072024)
关键词 北桥 芯片组 龙芯2处理器 内存控制器 PCI northbridge chipset Godson-2 processor DDR PCI
  • 相关文献

参考文献13

  • 1胡伟武,张福新,李祖松.龙芯2号处理器设计和性能分析[J].计算机研究与发展,2006,43(6):959-966. 被引量:37
  • 2WISHBONE System-on-Chip(SOC) Interconnection Architecture for Portable IP Cores,Revision B3.Corcoran,USA:Silicore Corporation,2001
  • 3RM7000 Family User Manual.British Columbia:PMC-Sierra Corporation,2001
  • 4B Davis,T Mudge,B Jacob,et al.DDR2 and low-latency variants[C].The 27th Int'l Symp on Computer Architecture (ISCA-2000),Vancouver,Canada,2000
  • 5Seiji Miura,Kazushige Ayukawa,Takao Watanabe.A dynamic-SDRAM-mode-control scheme for low-power systems with a 32-bit RISC CPU[C].In:Proc of ISLPED.New York:ACM Press,2001.358-363
  • 6李文,唐志敏.一种减少内存访问延时的方法[J].计算机工程,2006,32(3):242-244. 被引量:6
  • 7Todd C Mowry,Monica S Lam,Anoop Gupta.Design and evaluation of a compiler algorithm for prefetching[C].The 5th Int'l Conf on Architectural Support for Programming Languages and Operating Systems,Boston,MA,1992
  • 8S S Pinter,A Yoaz.Tango:A hardware-based data prefetching technique for superscalar processors[C].The 29th Annual Int'l Symp on Microarchitecture,Paris,France,1996
  • 9Lin Weifen,Steven K Reinhardt,Doug Burger.Reducing DRAM latencies with an integrated memory hierarchy design[C].In:Proc of the 7th Int'l Symp on High-Performance Computer Architecture (HPCA'01).Los Alamitos,CA:IEEE Computer Society Press,2001.301-312
  • 10N P Jouppi.Improving direct mapped cache performance by the addition of a small fully-associative cache and prefetch buffers[C].In:Proc of the 17th Annual Int'l Symp on Computer Architecture.New York:ACM Press,1990.364-373

二级参考文献15

  • 1Lin Weifen,Reinhardt S K,Burger D.Reducing DRAM Latencies with an Integrated Memory Hierarchy Design[C].Proceedings of the Seventh International Symposium on High Performance Computer Architecture,2001-01:301-312.
  • 2McKee S,Klenke R,Wright K,et al.Smarter Memory:Improving Bandwidth for Streamed References[J].IEEE Computer,1998,31(7):54-63.
  • 3Schumann R C.Design of the 21174 Memory Controller for Digital Personal Workstations[J].Digital Technical Journal,1997,9(2):57-70.
  • 4Miura S,Ayukawa K,Watanabe T.A Dynamic-SDRAM-mode-control Scheme for Low-power Systems with a 32-bit RISC CPU[C].The International Symposium on Low Power Electronics and Design,2001-08:358-363.
  • 5MIPS Ⅳ instruction set. http://www.mips.com, 1995
  • 6Divid Patterson, John Hennessy. Computer A rchitecture: AQuantitative Approach. San Francisco: Morgan Kaufmann, 1996
  • 7R. Kessler. The Alpha 21264 microprocessor, IEEE Micro,1999, 19(2): 24-36
  • 8Kenneth Yeager. The MIPS R10000 superscalar microprocessor.IEEE Micro, 1996, 16(3): 28-41
  • 9Tim Horel, Gary Lauterbach. UntraSparc-Ⅲ: Designing third-generation 64-bit performance. IEEE Micro, 1999, 19 (3) : 73-85
  • 10Ashok Kumar, The HP PA 8000 RISC CPU. IEEE Micro,1997, 17(2): 27-32

共引文献41

同被引文献14

  • 1胡伟武,张福新,李祖松.龙芯2号处理器设计和性能分析[J].计算机研究与发展,2006,43(6):959-966. 被引量:37
  • 2Rixner S, Dally W J, Kapasi U J, et al. Memory access scheduling. In: Proceedings of the 27th Annual Interna- tional Symposium on Computer Architecture, 2000. 128- 138.
  • 3Eyerman S, Eeckhout L. System-level performance met- tics for muhiprogram workloads. In: Proceedings of the 41th Microarchitecture, New York, USA, 2008. 42-53.
  • 4Nesbit K J, Aggarwal N, Laudon J, et al. Fair queuing memory systems. In : Proceedings of the 39th Microarchi- tecture, New York, USA, 2006. 208-222.
  • 5Mutlu O, Moscibroda T. Parallelism-aware batch schedu- ling: Enhancing both performance and fairness of shared DRAM systems. In: Proceedings of the 35th Annual In- ternational Symposium on Computer Architecture, New York, USA, 2008. 63-74.
  • 6Frederick A W, Craig H. Improving power and data effi- ciency with threaded memory modules. In: Proceedings of the 24th International Conference on Computer Design. NJ: IEEE, 2006. 417-424.
  • 7Brewer T M. Instruction set innovations for the Convey HC-1 computer. In: Proceedings of the 43th Microarehi- tecture, New York, USA, 2010. 70-79.
  • 8Zheng H, Lin J, Zhang Z, et al. Mini-rank: Adaptive DRAM architecture for improving memory power efficien- cy. In: Proceedings of the 41th Microarchitecture, New York, USA, 2008. 210-221.
  • 9Yoon D H, Jeong M K, Erez M. Adaptive granularity memory systems: A tradeoff between storage efficiency and throughput. In: Proceedings of the 38th Annual In- ternational Symposium on Computer Architecture, New York, USA, 2011. 295-306.
  • 10Vogelsang T. Understanding the energy consumption of dynamic random access memories. In: Proceedings of the 43th Microarchiteeture. New York, USA, 2010. 363-374.

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部