Data access delay is a major bottleneck in utilizing current high-end computing (HEC) machines. Prefetching, where data is fetched before CPU demands for it, has been considered as an effective solution to masking d...Data access delay is a major bottleneck in utilizing current high-end computing (HEC) machines. Prefetching, where data is fetched before CPU demands for it, has been considered as an effective solution to masking data access delay. However, current client-initiated prefetching strategies, where a computing processor initiates prefetching instructions, have many limitations. They do not work well for applications with complex, non-contiguous data access patterns. While technology advances continue to increase the gap between computing and data access performance, trading computing power for reducing data access delay has become a natural choice. In this paper, we present a serverbased data-push approach and discuss its associated implementation mechanisms. In the server-push architecture, a dedicated server called Data Push Server (DPS) initiates and proactively pushes data closer to the client in time. Issues, such as what data to fetch, when to fetch, and how to push are studied. The SimpleScalar simulator is modified with a dedicated prefetching engine that pushes data for another processor to test DPS based prefetching. Simulation results show that L1 Cache miss rate can be reduced by up to 97% (71% on average) over a superscalar processor for SPEC CPU2000 benchmarks that have high cache miss rates.展开更多
Data prefetching is an effective data access latency hiding technique to mask the CPU stall caused by cache misses and to bridge the performance gap between processor and memory. With hardware and/or software support,...Data prefetching is an effective data access latency hiding technique to mask the CPU stall caused by cache misses and to bridge the performance gap between processor and memory. With hardware and/or software support, data prefetching brings data closer to a processor before it is actually needed. Many prefetching techniques have been developed for single-core processors. Recent developments in processor technology have brought multicore processors into mainstream. While some of the single-core prefetching techniques are directly applicable to multicore processors, numerous novel strategies have been proposed in the past few years to take advantage of multiple cores. This paper aims to provide a comprehensive review of the state-of-the-art prefetching techniques, and proposes a taxonomy that classifies various design concerns in developing a prefetching strategy, especially for multicore processors. We compare various existing methods through analysis as well.展开更多
基金This research was supported in part by the National Science Foundation of U.S.A.under NSF Grant Nos. EIA-0224377,CNS-0406328,CNS-0509118,and CCF-0621435.
文摘Data access delay is a major bottleneck in utilizing current high-end computing (HEC) machines. Prefetching, where data is fetched before CPU demands for it, has been considered as an effective solution to masking data access delay. However, current client-initiated prefetching strategies, where a computing processor initiates prefetching instructions, have many limitations. They do not work well for applications with complex, non-contiguous data access patterns. While technology advances continue to increase the gap between computing and data access performance, trading computing power for reducing data access delay has become a natural choice. In this paper, we present a serverbased data-push approach and discuss its associated implementation mechanisms. In the server-push architecture, a dedicated server called Data Push Server (DPS) initiates and proactively pushes data closer to the client in time. Issues, such as what data to fetch, when to fetch, and how to push are studied. The SimpleScalar simulator is modified with a dedicated prefetching engine that pushes data for another processor to test DPS based prefetching. Simulation results show that L1 Cache miss rate can be reduced by up to 97% (71% on average) over a superscalar processor for SPEC CPU2000 benchmarks that have high cache miss rates.
基金supported in part by the National Science Foundation of USA under Grant Nos.EIA-0224377,CNS-0406328,CNS-0509118,and CCF-0621435.
文摘Data prefetching is an effective data access latency hiding technique to mask the CPU stall caused by cache misses and to bridge the performance gap between processor and memory. With hardware and/or software support, data prefetching brings data closer to a processor before it is actually needed. Many prefetching techniques have been developed for single-core processors. Recent developments in processor technology have brought multicore processors into mainstream. While some of the single-core prefetching techniques are directly applicable to multicore processors, numerous novel strategies have been proposed in the past few years to take advantage of multiple cores. This paper aims to provide a comprehensive review of the state-of-the-art prefetching techniques, and proposes a taxonomy that classifies various design concerns in developing a prefetching strategy, especially for multicore processors. We compare various existing methods through analysis as well.