期刊文献+

Active Store Window: Enabling Far Store-Load Forwarding with Scalability and Complexity-Efficiency

Active Store Window: Enabling Far Store-Load Forwarding with Scalability and Complexity-Efficiency
原文传递
导出
摘要 Conventional dynamically scheduled processors often use fully associative structures named load/store queue (LSQ) to implement the value communication between loads and the older in-flight stores and to detect the store-load order violation. But this in-flight forwarding only occupies about 15% of all store-load communications, which makes the CAM-based micro-architecture the major bottleneck to scale store-load communication further. This paper presents a new micro-architecture named ASW (short for active store window). It provides a new structure named speculative active store window to implement more aggressively speculative store-load forwarding than conventional LSQ. This structure could forward the data of committed stores to the executing loads without accessing to L1 data cache, which is referred to as far forwarding in this paper. At the back-end of the pipeline, it uses in-order load re-execution filtered by the tagged SSBF (short for store sequence bloom filter) to verify the correctness of the store-load forwarding. The speculative active store window and tagged store sequence bloom filter are all set-associate structures that are more efficient and scalable than fully associative structures. Experiments show that this simpler and faster design outperforms a conventional load/store queue based design and the NoSO desien on most benchmarks by 10.22% and 8.71% respectively. Conventional dynamically scheduled processors often use fully associative structures named load/store queue (LSQ) to implement the value communication between loads and the older in-flight stores and to detect the store-load order violation. But this in-flight forwarding only occupies about 15% of all store-load communications, which makes the CAM-based micro-architecture the major bottleneck to scale store-load communication further. This paper presents a new micro-architecture named ASW (short for active store window). It provides a new structure named speculative active store window to implement more aggressively speculative store-load forwarding than conventional LSQ. This structure could forward the data of committed stores to the executing loads without accessing to L1 data cache, which is referred to as far forwarding in this paper. At the back-end of the pipeline, it uses in-order load re-execution filtered by the tagged SSBF (short for store sequence bloom filter) to verify the correctness of the store-load forwarding. The speculative active store window and tagged store sequence bloom filter are all set-associate structures that are more efficient and scalable than fully associative structures. Experiments show that this simpler and faster design outperforms a conventional load/store queue based design and the NoSO desien on most benchmarks by 10.22% and 8.71% respectively.
出处 《Journal of Computer Science & Technology》 SCIE EI CSCD 2012年第4期769-780,共12页 计算机科学技术学报(英文版)
基金 supported by the National High Technology Research and Development 863 Program of China under Grant No.2009ZX01029-001-002 the Postdoctoral Science Foundation of China under Grant No. 20110490208
关键词 store-load forwarding load/store queue value-based load re-execution store-load forwarding, load/store queue, value-based load re-execution
  • 相关文献

参考文献22

  • 1Wulf W A, McKee S A. Hitting the memory wall: Impli- cations of the obvious. Computer Architecture News, 1995, 23(1): 20-24.
  • 2Park I, Ooi C L, Vijaykumar T N. Reducing design complex- ity of the load/store queue. In Proe. the 36th MICRO, San Diego, USA, Dec. 3-5, 2003, pp.411-422.
  • 3Gandhi A, Akkary H, Rajwar R, Srinivasan S T, Lai K. Scal- able load and store processing in latency tolerant processors. In Proe. the 32nd ISCA, Madison, USA, June 4-8, 2005, pp.446-457.
  • 4Pericas M, Cristal A, Cazorla F J, GonzMez R, Veidenbaum A, Jimenez D A, Valero M. A two-level load/store queue based on execution locality. In Proc. the 35th ISCA, Beijing, China, June 21-25, 2008, pp.25-36.
  • 5Sethumadhavan S, Desikan R, Burger D, Moore C R, Keck- ler S W. Scalable hardware memory disambiguation for high ILP processors. In Proe. the 36th MICRO, San Diego, USA, Dec. 3-5, 2003, pp.399-410.
  • 6Baugh L, Zilles C. Decomposing the load-store queue by func- tion for power reduction and scalability. IBM Journal of Re- search and Development, 2006, 50(2/3): 287-297.
  • 7Sha T T, Martin M M K, Roth A. Scalable store-load for- warding via store queue index prediction. In Proc. the 38th MICRO, Barcelona, Spain, Nov. 12-16, 2005, pp.159-170.
  • 8Stone S S, Woley K M, Prank M I. Address-indexed memory disambiguation and store-to-load forwarding. In Proc. the 38th MICRO, Barcelona, Spain, Nov. 12-16, 2005, pp.171- 182.
  • 9Roesner F, Burger D, Keckler S W. Counting dependence pre- dictors. In Proc. the 35th ISCA, Beijing, China, June 21-25, 2008, pp.215-226.
  • 10Sha T T, Martin M M K, Roth A. NoSQ: Store-load commu- nication without a store queue. In Proc. the 39th MICRO, Orlando, USA, Dec. 9-13, 2006, pp.285-296.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部