一种高性能北桥芯片的设计及性能分析被引量：1

Design Implementation and Performance Analysis of a High Performance Northbridge

下载PDF

导出

摘要计算机系统整体性能的提高不仅仅依赖于处理器计算能力的提升也需要高性能芯片组的有力支持.芯片组承担着CPU和外围设备通信的重任,而且目前大多数系统中采用把内存控制器集成在北桥中的方法,这更加突出了北桥在访存性能以至于在整个系统中的关键作用.以高性能为目标,龙芯2C处理器配套北桥芯片NB2005的设计和优化采用了很多新的方法和技术,其中包括根据程序行为进行动态Page管理的内存控制电路,一种与内存控制电路状态相结合的预取策略和具备高吞吐量低延迟的PCI通道设计等.性能测试和分析表明,搭配NB2005的龙芯2C系统访存带宽要比搭配Marvell GT64240北桥的系统提高40%以上,运行SPECCPU2000浮点和定点程序的性能分别提高了12.2%和2.5%,磁盘I/O的性能也提高了30%. To improve the performance of the entire computing system, not only the performance of CPU needs to be boosted, but also high performance chipsets are needed. Chipsets are responsible for data delivery between CPU and other devices, commonly with memory controllers embedded as crucial components, and this significance is highlighted as the memory access latency has become one of the most significant bottlenecks in nowadays computer systems. Discussed in this paper are the methods of designing and implementing a northbridge targeting at high performance. The architecture of NB2005--a northbridge for Godson-2 processor--and the optimization techniques applied on each module are described in detail. A novel dynamic page management strategy in DDR controller is proposed, which exploits the spatial locality characteristics of programs to reduce memory access latency. A new steam buffer mechanism is described, which at runtime jointly considers the memory access behavior and the status of memory controller. Also presented is a new buffer-swap mechanism implemented in PCI channel to improve the throughput of PCI bus. Experiments show that the Godson-2 system augmented with NB2005 outperforms that with Marvell GT64240 in all aspects tested. Specifically, NB2005 achieves above 40% memory bandwidth enhancement, yie also improves the lds speedups of 12.2 % and 2.5 % in SPEC INT2000 and SPEC FP2000 respectively and disk I/O performance by more than 30 %.

作者曾洪博胡明昌李文蔡飞唐志敏

机构地区中国科学院计算技术研究所计算机系统结构重点实验室

出处《计算机研究与发展》 EI CSCD 北大核心 2007年第9期1501-1509,共9页 Journal of Computer Research and Development

基金国家"九七三"重点基础研究发展规划基金项目(2005CB321600) 国家自然科学基金项目(60673146 60603049) 国家杰出青年科学基金项目(60325205) 国家"八六三"高技术研究发展计划基金项目(2006AA010201) 中国科学院计算技术研究所知识创新课题基金项目(20056240) 北京市自然科学基金项目(4072024)

关键词北桥芯片组龙芯2处理器内存控制器 PCI northbridge chipset Godson-2 processor DDR PCI

分类号 TP302 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献13

1胡伟武,张福新,李祖松.龙芯2号处理器设计和性能分析[J].计算机研究与发展,2006,43(6):959-966. 被引量：37
2WISHBONE System-on-Chip(SOC) Interconnection Architecture for Portable IP Cores,Revision B3.Corcoran,USA:Silicore Corporation,2001
3RM7000 Family User Manual.British Columbia:PMC-Sierra Corporation,2001
4B Davis,T Mudge,B Jacob,et al.DDR2 and low-latency variants[C].The 27th Int'l Symp on Computer Architecture (ISCA-2000),Vancouver,Canada,2000
5Seiji Miura,Kazushige Ayukawa,Takao Watanabe.A dynamic-SDRAM-mode-control scheme for low-power systems with a 32-bit RISC CPU[C].In:Proc of ISLPED.New York:ACM Press,2001.358-363
6李文,唐志敏.一种减少内存访问延时的方法[J].计算机工程,2006,32(3):242-244. 被引量：6
7Todd C Mowry,Monica S Lam,Anoop Gupta.Design and evaluation of a compiler algorithm for prefetching[C].The 5th Int'l Conf on Architectural Support for Programming Languages and Operating Systems,Boston,MA,1992
8S S Pinter,A Yoaz.Tango:A hardware-based data prefetching technique for superscalar processors[C].The 29th Annual Int'l Symp on Microarchitecture,Paris,France,1996
9Lin Weifen,Steven K Reinhardt,Doug Burger.Reducing DRAM latencies with an integrated memory hierarchy design[C].In:Proc of the 7th Int'l Symp on High-Performance Computer Architecture (HPCA'01).Los Alamitos,CA:IEEE Computer Society Press,2001.301-312
10N P Jouppi.Improving direct mapped cache performance by the addition of a small fully-associative cache and prefetch buffers[C].In:Proc of the 17th Annual Int'l Symp on Computer Architecture.New York:ACM Press,1990.364-373

二级参考文献15

1Lin Weifen,Reinhardt S K,Burger D.Reducing DRAM Latencies with an Integrated Memory Hierarchy Design[C].Proceedings of the Seventh International Symposium on High Performance Computer Architecture,2001-01:301-312.
2McKee S,Klenke R,Wright K,et al.Smarter Memory:Improving Bandwidth for Streamed References[J].IEEE Computer,1998,31(7):54-63.
3Schumann R C.Design of the 21174 Memory Controller for Digital Personal Workstations[J].Digital Technical Journal,1997,9(2):57-70.
4Miura S,Ayukawa K,Watanabe T.A Dynamic-SDRAM-mode-control Scheme for Low-power Systems with a 32-bit RISC CPU[C].The International Symposium on Low Power Electronics and Design,2001-08:358-363.
5MIPS Ⅳ instruction set. http://www.mips.com, 1995
6Divid Patterson, John Hennessy. Computer A rchitecture: AQuantitative Approach. San Francisco: Morgan Kaufmann, 1996
7R. Kessler. The Alpha 21264 microprocessor, IEEE Micro,1999, 19(2): 24-36
8Kenneth Yeager. The MIPS R10000 superscalar microprocessor.IEEE Micro, 1996, 16(3): 28-41
9Tim Horel, Gary Lauterbach. UntraSparc-Ⅲ: Designing third-generation 64-bit performance. IEEE Micro, 1999, 19 (3) : 73-85
10Ashok Kumar, The HP PA 8000 RISC CPU. IEEE Micro,1997, 17(2): 27-32

共引文献41

1刘奇,郝守青,沈海华,章隆兵.一种基于RAM的降低异构多核切换开销的方法[J].计算机研究与发展,2011,48(S1):266-272.
2郇丹丹,李祖松,王剑,章隆兵,胡伟武,刘志勇.快速地址计算的自适应栈高速缓存[J].计算机研究与发展,2007,44(1):169-176. 被引量：1
3李祖松,许先超,胡伟武,唐志敏.同时多微线程体系结构研究[J].计算机研究与发展,2007,44(5):768-774. 被引量：1
4康炜,张翔,王金伟,苗艳超,马捷.基于龙芯2E多处理器平台的虚拟机群系统[J].计算机工程,2008,34(10):256-258.
5方志斌,胡鹏,安学军,孙凝晖.龙芯2E多处理器芯片组的设计与实现[J].计算机应用研究,2008,25(5):1465-1469.
6郭学枫,孙凯军,张斌,胡明昌.龙芯2E北桥的显示控制器设计及性能分析[J].计算机工程,2008,34(14):225-227.
7徐广斌,匡文渊,周悦芝.透明计算系统端计算机间的数据传送方法[J].计算机工程,2008,34(17):1-3. 被引量：1
8钱振江,常晋义.龙芯Mipsel架构平台Linux发行版的开发[J].常熟理工学院学报,2008,22(10):87-91. 被引量：1
9顾丽红,魏海蕊.基于龙芯SIMD技术的AES加解密优化[J].计算机工程,2009,35(3):189-191. 被引量：2
10吴少刚,刘波.基于“龙芯”SIMD技术的RealVideo去块滤波优化[J].计算机工程与设计,2009,30(3):529-531. 被引量：2

同被引文献14

1胡伟武,张福新,李祖松.龙芯2号处理器设计和性能分析[J].计算机研究与发展,2006,43(6):959-966. 被引量：37
2Rixner S, Dally W J, Kapasi U J, et al. Memory access scheduling. In: Proceedings of the 27th Annual Interna- tional Symposium on Computer Architecture, 2000. 128- 138.
3Eyerman S, Eeckhout L. System-level performance met- tics for muhiprogram workloads. In: Proceedings of the 41th Microarchitecture, New York, USA, 2008. 42-53.
4Nesbit K J, Aggarwal N, Laudon J, et al. Fair queuing memory systems. In : Proceedings of the 39th Microarchi- tecture, New York, USA, 2006. 208-222.
5Mutlu O, Moscibroda T. Parallelism-aware batch schedu- ling: Enhancing both performance and fairness of shared DRAM systems. In: Proceedings of the 35th Annual In- ternational Symposium on Computer Architecture, New York, USA, 2008. 63-74.
6Frederick A W, Craig H. Improving power and data effi- ciency with threaded memory modules. In: Proceedings of the 24th International Conference on Computer Design. NJ: IEEE, 2006. 417-424.
7Brewer T M. Instruction set innovations for the Convey HC-1 computer. In: Proceedings of the 43th Microarehi- tecture, New York, USA, 2010. 70-79.
8Zheng H, Lin J, Zhang Z, et al. Mini-rank: Adaptive DRAM architecture for improving memory power efficien- cy. In: Proceedings of the 41th Microarchitecture, New York, USA, 2008. 210-221.
9Yoon D H, Jeong M K, Erez M. Adaptive granularity memory systems: A tradeoff between storage efficiency and throughput. In: Proceedings of the 38th Annual In- ternational Symposium on Computer Architecture, New York, USA, 2011. 295-306.
10Vogelsang T. Understanding the energy consumption of dynamic random access memories. In: Proceedings of the 43th Microarchiteeture. New York, USA, 2010. 363-374.

引证文献1

1张广飞,王焕东,陈新科,黄帅,陈李维.多微通道内存系统设计方法[J].高技术通讯,2013,23(7):685-693.

1研华推出Mini-ITX工业主板[J].现代包装,2012(3):10-10.
2研华推出基于Atom Cedar Trail处理器的Mini-ITX工业主板[J].电子测量技术,2012,35(1):143-144.
3王云,刘红梅.深入分析单片机软件和硬件的结合[J].中国科技博览,2008(23):20-21.
4张迎新.370主板的现状与未来[J].电脑爱好者,1999,0(13):67-69.
5研华推出基于AtomCedarTrail处理器的Mini．ITX工业主板[J].测控技术,2012,31(2):135-135.
6研华推出Mini-ITX工业主板[J].机电工程技术,2012,41(2):63-63.
7何大章.n处自由投切电路状态的布尔函数分析判别[J].长沙水电师院自然科学学报,1990,5(1):121-129.
8方伟骏,黄圣国.人工鱼群算法选择特征和加权的模拟电路故障诊断[J].现代电子技术,2016,39(19):161-164.
9Walkmanfz,Zero.独立为先整合有道整合芯片组评测专题[J].电脑自做,2004(5):48-55.
10房元平,许娇阳,葛珂.流水线处理技术在数据集成中的应用[J].微型机与应用,2010,29(24):67-69. 被引量：1

计算机研究与发展

2007年第9期

浏览历史

内容加载中请稍等...

一种高性能北桥芯片的设计及性能分析被引量：1

参考文献13

二级参考文献15

共引文献41

同被引文献14

引证文献1

相关作者

相关机构

相关主题

浏览历史

一种高性能北桥芯片的设计及性能分析 被引量：1

参考文献13

二级参考文献15

共引文献41

同被引文献14

引证文献1

相关作者

相关机构

相关主题

浏览历史

一种高性能北桥芯片的设计及性能分析被引量：1