面向内存访问性能优化的总线仲裁方法

A Bus Arbitration Scheme for Memory Access Performance Optimization

下载PDF

导出

摘要访存交易的处理顺序对内存访问的性能有重要影响.同一个SoC设备发出的多个未决交易往往地址连续且读写类型相同.然而,传统的总线仲裁方法导致各个设备发出的未决交易序列交错地发送至内存控制器,而内存控制器访存调度的范围有限,最终导致此类序列通常无法连续地访问内存.为解决此问题,提出一种新型的总线仲裁方法CGH,该方法利用SoC设备通信行为的特征,通过识别同一个SoC设备发出的、行地址和读写类型相同的未决交易序列并让其连续获得仲裁授权,减少内存切换行地址和读写类型的次数;同时,在选择将要授权的未决交易序列时,优先考虑行地址和读写类型与最近授权交易相同的申请,进一步提高访存效率.将CGH仲裁方法应用至北大众志-SKSoC后,系统访存性能提高了21.37%,而总线面积仅增加2.83%.此外,由于行地址切换次数减少,内存的能耗也降低了15.15%. Memory access performance is strongly dependent on the processing sequence of memory transactions. On a system bus, the outstanding memory transactions issued by a bus device often have consecutive address and the same read or write （R/W） types. Under traditional bus arbitration schemes, however, outstanding transactions from different devices are most likely to be interleaved with each other, which incurs non-sequential addressing access as well as different R/W types access. Due to the limited scheduling performance of the memory controller, such sequences usually prevent the memory controller from accessing the memory effectively. In this paper, we propose a novel bus arbitration scheme, CGH, to minimize the number of memory row addressing and R]W type context switches. CGH can recognize and grant outstanding transaction sequence from the same bus device with the same row address and R/W type. It also prioritizes the requests which have the same memory row address and R/W type as the most recent transaction during grant handoff to achieve further improvement. Being applied to the PKUnity-SK SoC, the proposed arbitration scheme significantly elevates the memory access performance by 21.37 % with only 2.83 % area overhead. It also reduces the memory power consumption by 15.15% because of less row activations.

作者刘丹冯毅佟冬程旭王克义

机构地区北京大学微处理器研究开发中心

出处《计算机研究与发展》 EI CSCD 北大核心 2012年第5期1061-1071,共11页 Journal of Computer Research and Development

基金国家"八六三"高技术研究发展计划基金项目(2006AA010202)

关键词系统芯片总线仲裁内存控制器能耗开销 system-on-chip bus arbitration memory controller energy consumption

分类号 TP302.2 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献19

1Saleh R,Wilton S,Mirabbasi S. System-on-chip:Reuse and integration[J].Proceedings of the IEEE,2006,(06):1050-1069.doi:10.1109/JPROC.2006.873611.
2ARM. AMBA AXI Protocol Specification[EB/OL].http://yjkims79.com/attachment/1078048893.pdf,2004,2010.
3Pasricha S,Dutt N. On-Chip Communication Architectures:System on Chip Interconnect[M].San Francisco:Morgan Kaufmann Publishers,2008.
4Medardoni S,Ruggiero M,Bertozzi D. Capturing the interaction of the communication, memory and I/O subsystems in memory-centric industrial MPSoC platforms[A].San Jose:EDA Consortium,2007.660-665.
5JEDEC. DDR2 SDRAM Specification[EB/OL].http://www.jedec.org/download/search/JESD79-2E.pdf,2006,2010.
6Rixner S,Dally WJ,Kapasi UJ. Memory access scheduling[A].New York:ACM,2000.128-138.
7Jang W,Pan DZ. An SDRAM Aware router for networks on-chip[A].New York:ACM,2009.800-805.
8Chang NYC,Liao YZ,Chang TS. Analysis of shared link AXI[J].IET Computers & Digital Techniques,2009,(04):373-383.
9黄侃,佟冬,刘洋,杨寿贵,程旭.MCS-DMA:一种面向SoC内DMA传输的内存控制器优化设计[J].电子学报,2010,38(3):598-604. 被引量：6
10Weber,WD Chou J,Swarbrick I. A quality-of-service mechanism for intereonncction networks in system-on-chips[A].Los Alamitos,CA:IEEE Computer Society,2005.1232-1237.

二级参考文献38

1潘杰,胡丹,张志敏.Lottery Bus的设计与实现[J].微电子学与计算机,2005,22(7):76-78. 被引量：2
2王蕾,陆洪毅,王进,戴葵,王志英.一种面向嵌入式应用的片上系统:腾跃-1[J].电子学报,2005,33(11):2036-2039. 被引量：4
3Asanovic K, Bodik R et al. The landscape of parallel computing research: A view from Berkeley. University of California, Berkeley, San Francisco, CA, USA: Technical Report UCB/EECS-2006-183, 2006
4Dasu A, Panchanathan S, A survey of media processing approaches. IEEE Transactions on Circuits and Systems for Video Technology, 2002, 12(8): 633-640
5Rathnam S, Slavenburg G. An architecture overview of the programmable multimedia processor: TM-1//Proceedings of the Compcon' 96, Technologies for the Information Superhighway Digest of Papers. Santa Clara, CA, USA, 1996:319-326
6Kgil T, Roberts D, Mudge T. Improving NAND flash based disk caches//Proceedings of the 35th Internal Symposium on Computer Architecture. Beijing, China, 2008:327-338
7Trainor M. Overcoming disk drive access bottlenecks with Intel robson technology. Intel Technology Magazine, 2006, 4(9) : 9-11
8Stevens C E. At attachment 8-ATA/ATAPI command set (ATAS-ACS). T13 Technical Committee, United States: AT Attachment-8: D1699rlc-ATA8-ACS, 2007
9Chen F, Jiang S, Zhang X. SmartSaver: Turning flash drive into a disk energy saver for mobile computers//Proceedings of the Low Power Electronics and Design. Tegernsee, Germany, 2006:412-417
10Kgil T, Mudge T. FlashCache: A NAND flash memory file cache for low power web servers//Proceedings of the 2006 International Conference on Compilers, Architecture and Synthesis for Embedded Systems. Seoul, Korea, 2006: 103- 112

共引文献13

1张志峰.32位微处理器中存储管理单元的全定制设计[J].计算机工程与应用,2010,46(13):56-58.
2黄侃,佟冬,刘洋,杨寿贵,程旭.MCS-DMA:一种面向SoC内DMA传输的内存控制器优化设计[J].电子学报,2010,38(3):598-604. 被引量：6
3陈利平,高金华.多处理器片上系统中实时自适用仲裁器的设计与分析[J].微处理机,2010,31(3):6-10. 被引量：1
4任沛阁,王勇,刘安,莫远楠.基于最小空闲时间优先的片上总线仲裁算法[J].电子技术应用,2010,36(11):35-38. 被引量：1
5张良,佟冬,程旭,王克义.覆盖矩阵反馈的演化测试程序生成方法[J].计算机辅助设计与图形学学报,2011,23(3):456-464.
6黄侃,佟冬,程旭.基于行冲突预测的内存控制器QoS管理机制[J].电子学报,2011,39(2):358-363.
7刘丹,冯毅,党向磊,佟冬,程旭,王克义.降低系统芯片中跨时钟域设计和验证复杂度的方法[J].通信学报,2012,33(11):151-158. 被引量：3
8李军伟,戴紫彬,南龙梅.密码SoC中嵌入式链式DMA的研究与设计[J].电子技术应用,2014,40(1):56-59. 被引量：6
9吴睿振,杨银堂,张丽,周端.一种改进的高速彩票总线仲裁器[J].电子与信息学报,2014,36(8):2016-2022. 被引量：1
10刘晓楠,赵荣彩,庞建民.软件移植、二进制翻译和国产处理器发展[J].信息工程大学学报,2014,15(5):613-616. 被引量：4

1储青,翟坦,付海帆.一种共总线多主系统的通用仲裁方法[J].工业控制计算机,1990(6):14-17. 被引量：1
2刘建胜,涂海宁,张华,夏芳臣,熊君星.MES中分布式制造资源竞争仲裁方法及其应用研究[J].机械设计与制造,2008(11):97-99.
3傅海帆,储青,瞿坦.双主CPU共总线系统的简单仲裁方法[J].华中理工大学学报,1991,19(6):57-63. 被引量：1
4钟治初,郭江鸿,张海峰.高效安全的无线传感器网络数据聚合方案[J].计算机应用,2013,33(A01):137-140. 被引量：2
5张冬英.重视DOS系统配置对软件运行的影响[J].微计算机信息,1998,14(3):62-63.
6陆俊林,刘丹,佟冬,程旭.一种提供高效带宽分配和低延迟的片上通信仲裁方法(英文)[J].北京大学学报（自然科学版）,2009,45(1):20-28. 被引量：1
7陈实.多主CPU的STD总统系统[J].兵工自动化,1996,15(2):71-73.
8中央处理器[J].保密科学技术,2013,0(11):71-71.
9胡江涛.Windows应用程序如何访问DOS内存的常驻部分[J].电子与电脑,1997(7):136-137.
10黄伟.再谈“跨越内存禁区”[J].程序员,2001(6):82-83.

计算机研究与发展

2012年第5期

浏览历史

内容加载中请稍等...

面向内存访问性能优化的总线仲裁方法

参考文献19

二级参考文献38

共引文献13

相关作者

相关机构

相关主题

浏览历史