期刊文献+

非线性规律访存操作的数据预取技术 被引量:1

Data Prefetching Technique of Nonlinear Memory Access
下载PDF
导出
摘要 编译器在静态分析方式下很难对程序的非线性规律访存操作进行正确的数据预取.但采用pro-filing技术可以得到程序运行时候的访存规律,利用这些信息可以精确地插入数据预取指令.基于strideprofiling技术,提出了新的信息收集类型strideiterative,更精确地反映程序执行时访存指令的实际行为,并结合别名分析的结果调整对同一cache行的数据预取,得到比普通数据预取更好的预取性能.安腾2上运行CPU2000的12个整型测试例子平均有8.54%的性能提升,其中mcf性能提升达到了77.87%. By static analysis, the compiler can hardly correctly prefetch data that are nonlinear accessed. But by using profiling techniques one can get the regulation by which the program accesses memory, and then by using these profiling information, the compiler is guided to accurately insert prefetch instructions. Based on stride profiling technique, a new information type named stride iterative is put forward, which is more accurate than normal profiling. Together with the alias information to adjust the data prefetch for the same cache line, the compiler gets a better performance than the normal data prefetch. The CPU2000 12 INT cases get 8.54 96 performance improvement on the average, and mcf gets an 77.87% performance increase.
出处 《计算机研究与发展》 EI CSCD 北大核心 2007年第2期355-360,共6页 Journal of Computer Research and Development
基金 国家"八六三"高技术研究发展计划软件重大专项基金项目(2004AA1Z2200)
关键词 数据预取 编译器 PROFILING 性能分析 缓存 非线性 data prefetch compiler profiling performance analysis cache nonlinear
  • 相关文献

参考文献8

  • 1Wu Youfeng.Efficient discovery of regular stride patterns in irregular programs and its use in compiler prefetching[C].Programming Language Design and Implementation 2002,Berlin,Germany,2002
  • 2David Callahan,Ken Kennedy,Allan Porterfield.Software prefetching[C].The 4th Int'l Conf on Architectural Support for Programming Languages and Operating Systems,Santa Clara,California,1991
  • 3Jean-Loup Baer,Tien-Fu Chen.Effective hardware-based data prefetching for high-performance processors[J].IEEE Trans on Computers,1995,44(5):609-623
  • 4Doug Joseph,Dirk Grunwald.Prefetching using Markov predictors[C].The 24th Annual Int'l Symp on Computer Architecture,Denver,Colorado,1997
  • 5Amir Roth,Andreas Moshovos,Gurindar S Sohi.Dependence based prefetching for linked data structures[C].The 8th Int'l Conf on Architectural Support for Programming Languages and Operating Systems,San Jose,California,1998
  • 6Intel Corp.Intel(R) ItaniumTM Processor Hardware Developer's Manual[OL].http://developer.intel.com/design/ia-64/manuals.htm,2000
  • 7中国科学院计算技术研究所.基于IA64开放源代码编译器[OL].http://ipf-orc.sourceforge.net/readme-release-2.1.htm,2005
  • 8Chi-Keung Luk,Robert Muth,Harish Patil,et al.Profile-guided post-link stride prefetching[C].The 16th Int'l Conf on Supercomputing,New York,2002

同被引文献25

  • 1郇丹丹,李祖松,胡伟武,刘志勇.结合访存失效队列状态的预取策略[J].计算机学报,2007,30(7):1104-1114. 被引量:3
  • 2Gendler A,Mendelson A,Birk Y.A PAB-based multiprefetcher mechanism[J].Int J Parallel Program,2006,34(2):171-188.
  • 3Magnusson P S.Simics:A full system simulation platform[J].Computer,2002,35(2):50-58.
  • 4Muralimanohar N,Balasubramonian R,Jouppi N.Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0[C] //Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture.Washington,DC,USA:IEEE Computer Society,2007:3-14.
  • 5Bailey D H.The NAS parallel benchmark results[C] //Proceedings of the 1992 ACM/IEEE Conference on Supercomputing.Now York,USA:ACM,1992:386-393.
  • 6Henning J L.SPEC CPU 2000:Measuring CPU performance in the new millennium[J].Computer,2000,33(7):28-35.
  • 7Kim C,Burger D,Keckler S W.An adaptive,non-uniform cache structure for wire-delay dominated on-chip caches[C] //Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems.New York,USA:ACM,2002:211-222.
  • 8Chishti Z,Powell M D,Vijaykumar T N.Distance associativity for high-performance energy-efficient non-uni-form cache architectures[C] //Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture.Washington,DC,USA:IEEE Computer Society,2003.
  • 9Beckmann B M,Wood D A.Managing wire delay in large chip-multiprocessor caches[C] //Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture.Washington,DC,USA:IEEE Computer Society,2004:319-330.
  • 10Chishti Z,Powell M D,Vijaykumar T N.Optimizing replication,communication,and capacity allocation in CMPs[C] //Proceedings of the 32nd Annual International Symposium on Computer Architecture.Washington,DC,USA:IEEE Computer Society,2005:357-368.

引证文献1

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部