期刊文献+

Server-Based Data Push Architecture for Multi-Processor Environments 被引量:3

Server-Based Data Push Architecture for Multi-Processor Environments
原文传递
导出
摘要 Data access delay is a major bottleneck in utilizing current high-end computing (HEC) machines. Prefetching, where data is fetched before CPU demands for it, has been considered as an effective solution to masking data access delay. However, current client-initiated prefetching strategies, where a computing processor initiates prefetching instructions, have many limitations. They do not work well for applications with complex, non-contiguous data access patterns. While technology advances continue to increase the gap between computing and data access performance, trading computing power for reducing data access delay has become a natural choice. In this paper, we present a serverbased data-push approach and discuss its associated implementation mechanisms. In the server-push architecture, a dedicated server called Data Push Server (DPS) initiates and proactively pushes data closer to the client in time. Issues, such as what data to fetch, when to fetch, and how to push are studied. The SimpleScalar simulator is modified with a dedicated prefetching engine that pushes data for another processor to test DPS based prefetching. Simulation results show that L1 Cache miss rate can be reduced by up to 97% (71% on average) over a superscalar processor for SPEC CPU2000 benchmarks that have high cache miss rates. Data access delay is a major bottleneck in utilizing current high-end computing (HEC) machines. Prefetching, where data is fetched before CPU demands for it, has been considered as an effective solution to masking data access delay. However, current client-initiated prefetching strategies, where a computing processor initiates prefetching instructions, have many limitations. They do not work well for applications with complex, non-contiguous data access patterns. While technology advances continue to increase the gap between computing and data access performance, trading computing power for reducing data access delay has become a natural choice. In this paper, we present a serverbased data-push approach and discuss its associated implementation mechanisms. In the server-push architecture, a dedicated server called Data Push Server (DPS) initiates and proactively pushes data closer to the client in time. Issues, such as what data to fetch, when to fetch, and how to push are studied. The SimpleScalar simulator is modified with a dedicated prefetching engine that pushes data for another processor to test DPS based prefetching. Simulation results show that L1 Cache miss rate can be reduced by up to 97% (71% on average) over a superscalar processor for SPEC CPU2000 benchmarks that have high cache miss rates.
出处 《Journal of Computer Science & Technology》 SCIE EI CSCD 2007年第5期641-652,共12页 计算机科学技术学报(英文版)
基金 This research was supported in part by the National Science Foundation of U.S.A.under NSF Grant Nos. EIA-0224377,CNS-0406328,CNS-0509118,and CCF-0621435.
关键词 performance measurement evaluation MODELING simulation of multiple-processor system cache memory performance measurement, evaluation, modeling, simulation of multiple-processor system, cache memory
  • 相关文献

参考文献42

  • 1DARPA. High productivity computing systems (HPCS), vision: Focus on the lost dimension of HPC “User &: system efficiency and productivity”. http://www.darpa.mil/ipto/programs/hpcs/vision.htm.
  • 2John Hennessy, David Patterson. Computer Architecture: A Quantitative Approach. Fourth edition, Morgan Kaufmann, ISBN: 0123704901, 2006.
  • 3Wm A Wulf, Sally A McKee. Hitting the memory wall: Implications of the obvious. ACM SIGARPH Computer Architecture News, March 1995, 23(1): 20-24.
  • 4Chen T F, Baer J L. Effective hardware-based data prefetching for high performance processors. IEEE Transactions on Computers, 1995, 44(5): 609-623.
  • 5Dahlgren F, Dubois M, Stenstrom P. Fixed and adaptive sequential prefetching in shared-memory multiprocessors. In Proc. International Conference on Parallel Processing (ICPP), Los Alamitos, CA, USA, CRC Press, 1993, Vol.1, pp.56--63.
  • 6Fu J, Patel J H. Data prefetching in multiprocessor vector cache memories. In Proc. the 17th Annual International Symposium on Computer Architecture, Toronto, Canada, 1991, pp.54--63.
  • 7Joseph D, Grunwald D. Prefetching using Markov predictors. In Proc. the 24th International Symposium on Computer Architecture, Denver-Colorado, 1997, pp.252-263.
  • 8Gokul Kandiraju, Anand Sivasubramaniam. Going the distance for TLB prefetching: An application-driven study. In Proc. the International Symposium on Computer Architecture, Anchorage, Alaska, 2002, p.195.
  • 9Alexander T, Kedem G. Distributed predictive cache design for high performance memory system. In Proc. the 2nd International Symposium on High Performance Computer Architecture (HPCA), San Jose, CA, 1996, pp.254-263.
  • 10Collins J, Tullsen D, Wang H, Shen J. Dynamic speculative precomputation. In Proc. the 34th International Symposium on Microarvhitecture, Austin, Texas, 2001, pp.306-317.

同被引文献13

引证文献3

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部