Active Store Window: Enabling Far Store-Load Forwarding with Scalability and Complexity-Efficiency

Active Store Window: Enabling Far Store-Load Forwarding with Scalability and Complexity-Efficiency

导出

摘要 Conventional dynamically scheduled processors often use fully associative structures named load/store queue （LSQ） to implement the value communication between loads and the older in-flight stores and to detect the store-load order violation. But this in-flight forwarding only occupies about 15% of all store-load communications, which makes the CAM-based micro-architecture the major bottleneck to scale store-load communication further. This paper presents a new micro-architecture named ASW （short for active store window）. It provides a new structure named speculative active store window to implement more aggressively speculative store-load forwarding than conventional LSQ. This structure could forward the data of committed stores to the executing loads without accessing to L1 data cache, which is referred to as far forwarding in this paper. At the back-end of the pipeline, it uses in-order load re-execution filtered by the tagged SSBF （short for store sequence bloom filter） to verify the correctness of the store-load forwarding. The speculative active store window and tagged store sequence bloom filter are all set-associate structures that are more efficient and scalable than fully associative structures. Experiments show that this simpler and faster design outperforms a conventional load/store queue based design and the NoSO desien on most benchmarks by 10.22% and 8.71% respectively. Conventional dynamically scheduled processors often use fully associative structures named load/store queue （LSQ） to implement the value communication between loads and the older in-flight stores and to detect the store-load order violation. But this in-flight forwarding only occupies about 15% of all store-load communications, which makes the CAM-based micro-architecture the major bottleneck to scale store-load communication further. This paper presents a new micro-architecture named ASW （short for active store window）. It provides a new structure named speculative active store window to implement more aggressively speculative store-load forwarding than conventional LSQ. This structure could forward the data of committed stores to the executing loads without accessing to L1 data cache, which is referred to as far forwarding in this paper. At the back-end of the pipeline, it uses in-order load re-execution filtered by the tagged SSBF （short for store sequence bloom filter） to verify the correctness of the store-load forwarding. The speculative active store window and tagged store sequence bloom filter are all set-associate structures that are more efficient and scalable than fully associative structures. Experiments show that this simpler and faster design outperforms a conventional load/store queue based design and the NoSO desien on most benchmarks by 10.22% and 8.71% respectively.

作者张栚滈王箫音佟冬易江芳陆俊林王克义

机构地区 Microprocessor Research and Development Center Engineering Research Center of Microprocessor and System School of Electronics Engineering and Computer Science

出处《Journal of Computer Science & Technology》 SCIE EI CSCD 2012年第4期769-780,共12页 计算机科学技术学报（英文版）

基金 supported by the National High Technology Research and Development 863 Program of China under Grant No.2009ZX01029-001-002 the Postdoctoral Science Foundation of China under Grant No. 20110490208

关键词 store-load forwarding load/store queue value-based load re-execution store-load forwarding, load/store queue, value-based load re-execution

分类号 TP333 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献22

1Wulf W A, McKee S A. Hitting the memory wall: Impli- cations of the obvious. Computer Architecture News, 1995, 23(1): 20-24.
2Park I, Ooi C L, Vijaykumar T N. Reducing design complex- ity of the load/store queue. In Proe. the 36th MICRO, San Diego, USA, Dec. 3-5, 2003, pp.411-422.
3Gandhi A, Akkary H, Rajwar R, Srinivasan S T, Lai K. Scal- able load and store processing in latency tolerant processors. In Proe. the 32nd ISCA, Madison, USA, June 4-8, 2005, pp.446-457.
4Pericas M, Cristal A, Cazorla F J, GonzMez R, Veidenbaum A, Jimenez D A, Valero M. A two-level load/store queue based on execution locality. In Proc. the 35th ISCA, Beijing, China, June 21-25, 2008, pp.25-36.
5Sethumadhavan S, Desikan R, Burger D, Moore C R, Keck- ler S W. Scalable hardware memory disambiguation for high ILP processors. In Proe. the 36th MICRO, San Diego, USA, Dec. 3-5, 2003, pp.399-410.
6Baugh L, Zilles C. Decomposing the load-store queue by func- tion for power reduction and scalability. IBM Journal of Re- search and Development, 2006, 50(2/3): 287-297.
7Sha T T, Martin M M K, Roth A. Scalable store-load for- warding via store queue index prediction. In Proc. the 38th MICRO, Barcelona, Spain, Nov. 12-16, 2005, pp.159-170.
8Stone S S, Woley K M, Prank M I. Address-indexed memory disambiguation and store-to-load forwarding. In Proc. the 38th MICRO, Barcelona, Spain, Nov. 12-16, 2005, pp.171- 182.
9Roesner F, Burger D, Keckler S W. Counting dependence pre- dictors. In Proc. the 35th ISCA, Beijing, China, June 21-25, 2008, pp.215-226.
10Sha T T, Martin M M K, Roth A. NoSQ: Store-load commu- nication without a store queue. In Proc. the 39th MICRO, Orlando, USA, Dec. 9-13, 2006, pp.285-296.

1王建文.多媒体在语文教学中的应用[J].科学中国人,2014(10S):230-231.
2牛会彪.浅谈多媒体在语文教学中的利与弊[J].软件（教育现代化）（电子版）,2014,4(4):16-16.
3HU Ting,QIU Chen,YU Ping,YANG LongZhi,WANG WanJun,JIANG XiaoQing,YANG Mei,ZHANG Lei,YANG JianYi.Silicon photonic network-on-chip and enabling components[J].Chinese Science Bulletin,2013,58(7):543-553. 被引量：2
4李玉发,高德远,黄小平.基于存储队列的Cache访问性能优化研究[J].计算机测量与控制,2009,17(11):2260-2262.
5马汝亮,谢憬,毛志刚.一种基于ESVW技术的新型载入存储队列设计[J].微电子学与计算机,2013,30(7):20-23.
6白斌.最忠实的用户永不满足的研发者[J].电脑爱好者,2009(23):64-64.
7舟子.地震废墟，传出花蕾绽开的声音……[J].广东安全生产,2008(11):42-43.
8创新驱动优化升级智能制造初绽花蕾[J].现代传输,2016(2):24-26.
9高鸥.演示文稿内容结构呈现方法比较研究[J].软件导刊,2015,14(10):68-71. 被引量：3
10吴文波.ASW压缩驱动式低音炮的设计制作[J].实用电子文摘,1998(12):47-51.

Journal of Computer Science & Technology

2012年第4期

浏览历史

内容加载中请稍等...

Active Store Window: Enabling Far Store-Load Forwarding with Scalability and Complexity-Efficiency

参考文献22

相关作者

相关机构

相关主题

浏览历史