非线性规律访存操作的数据预取技术被引量：1

Data Prefetching Technique of Nonlinear Memory Access

下载PDF

导出

摘要编译器在静态分析方式下很难对程序的非线性规律访存操作进行正确的数据预取.但采用pro-filing技术可以得到程序运行时候的访存规律,利用这些信息可以精确地插入数据预取指令.基于strideprofiling技术,提出了新的信息收集类型strideiterative,更精确地反映程序执行时访存指令的实际行为,并结合别名分析的结果调整对同一cache行的数据预取,得到比普通数据预取更好的预取性能.安腾2上运行CPU2000的12个整型测试例子平均有8.54%的性能提升,其中mcf性能提升达到了77.87%. By static analysis, the compiler can hardly correctly prefetch data that are nonlinear accessed. But by using profiling techniques one can get the regulation by which the program accesses memory, and then by using these profiling information, the compiler is guided to accurately insert prefetch instructions. Based on stride profiling technique, a new information type named stride iterative is put forward, which is more accurate than normal profiling. Together with the alias information to adjust the data prefetch for the same cache line, the compiler gets a better performance than the normal data prefetch. The CPU2000 12 INT cases get 8.54 96 performance improvement on the average, and mcf gets an 77.87% performance increase.

作者吴佳骏冯晓兵张兆庆

机构地区中国科学院计算技术研究所

出处《计算机研究与发展》 EI CSCD 北大核心 2007年第2期355-360,共6页 Journal of Computer Research and Development

基金国家"八六三"高技术研究发展计划软件重大专项基金项目(2004AA1Z2200)

关键词数据预取编译器 PROFILING 性能分析缓存非线性 data prefetch compiler profiling performance analysis cache nonlinear

分类号 TP314 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献8

1Wu Youfeng.Efficient discovery of regular stride patterns in irregular programs and its use in compiler prefetching[C].Programming Language Design and Implementation 2002,Berlin,Germany,2002
2David Callahan,Ken Kennedy,Allan Porterfield.Software prefetching[C].The 4th Int'l Conf on Architectural Support for Programming Languages and Operating Systems,Santa Clara,California,1991
3Jean-Loup Baer,Tien-Fu Chen.Effective hardware-based data prefetching for high-performance processors[J].IEEE Trans on Computers,1995,44(5):609-623
4Doug Joseph,Dirk Grunwald.Prefetching using Markov predictors[C].The 24th Annual Int'l Symp on Computer Architecture,Denver,Colorado,1997
5Amir Roth,Andreas Moshovos,Gurindar S Sohi.Dependence based prefetching for linked data structures[C].The 8th Int'l Conf on Architectural Support for Programming Languages and Operating Systems,San Jose,California,1998
6Intel Corp.Intel(R) ItaniumTM Processor Hardware Developer's Manual[OL].http://developer.intel.com/design/ia-64/manuals.htm,2000
7中国科学院计算技术研究所.基于IA64开放源代码编译器[OL].http://ipf-orc.sourceforge.net/readme-release-2.1.htm,2005
8Chi-Keung Luk,Robert Muth,Harish Patil,et al.Profile-guided post-link stride prefetching[C].The 16th Int'l Conf on Supercomputing,New York,2002

同被引文献25

1郇丹丹,李祖松,胡伟武,刘志勇.结合访存失效队列状态的预取策略[J].计算机学报,2007,30(7):1104-1114. 被引量：3
2Gendler A,Mendelson A,Birk Y.A PAB-based multiprefetcher mechanism[J].Int J Parallel Program,2006,34(2):171-188.
3Magnusson P S.Simics:A full system simulation platform[J].Computer,2002,35(2):50-58.
4Muralimanohar N,Balasubramonian R,Jouppi N.Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0[C] //Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture.Washington,DC,USA:IEEE Computer Society,2007:3-14.
5Bailey D H.The NAS parallel benchmark results[C] //Proceedings of the 1992 ACM/IEEE Conference on Supercomputing.Now York,USA:ACM,1992:386-393.
6Henning J L.SPEC CPU 2000:Measuring CPU performance in the new millennium[J].Computer,2000,33(7):28-35.
7Kim C,Burger D,Keckler S W.An adaptive,non-uniform cache structure for wire-delay dominated on-chip caches[C] //Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems.New York,USA:ACM,2002:211-222.
8Chishti Z,Powell M D,Vijaykumar T N.Distance associativity for high-performance energy-efficient non-uni-form cache architectures[C] //Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture.Washington,DC,USA:IEEE Computer Society,2003.
9Beckmann B M,Wood D A.Managing wire delay in large chip-multiprocessor caches[C] //Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture.Washington,DC,USA:IEEE Computer Society,2004:319-330.
10Chishti Z,Powell M D,Vijaykumar T N.Optimizing replication,communication,and capacity allocation in CMPs[C] //Proceedings of the 32nd Annual International Symposium on Computer Architecture.Washington,DC,USA:IEEE Computer Society,2005:357-368.

引证文献1

1吴俊杰,杨学军.面向非一致Cache的任意步长预提升技术[J].计算机科学与探索,2010,4(7):577-588. 被引量：4

二级引证文献4

1张荣芸.浅析缓存预取技术[J].现代计算机,2011,17(12):38-40. 被引量：3
2靳强,郭阳,鲁建壮.一种步长自适应二级cache预取机制[J].计算机工程与应用,2011,47(29):56-59. 被引量：5
3毛席龙,杨安,吕高锋,林琦,程辉.基于可变步长的访存延迟测量模型的研究与实现[J].计算机工程与科学,2014,36(1):12-18.
4段宇博,王乐.多核处理器P2020的访存实时性分析与优化[J].航空计算技术,2015,45(3):108-112. 被引量：4

1单冬红,赵伟艇.基于主成分分析的神经网络入侵检测仿真研究[J].计算机仿真,2011,28(6):153-156. 被引量：8
2卓先德.网络安全评估的仿真与应用研究[J].计算机仿真,2011,28(6):177-180. 被引量：14
3戴闽鲁,金学广,周玉坤,小岛正人,陶涛,白栋,王章敏,江南.CMMB无线网络的测试与分析[J].广播电视信息,2008,15(9):50-51. 被引量：1
4沈渊.基于入侵关联跟踪的P2P网络入侵检测方法[J].科技通报,2013,29(6):32-34. 被引量：16
5周谦,冯晓兵,张兆庆.Cache Profiling技术[J].计算机工程,2006,32(13):47-48. 被引量：2
6刘弢,吴承勇,张兆庆.基于部分调用图的线程敏感Profiling技术[J].计算机工程,2008,34(10):30-32. 被引量：1
7何伟,谭曙光,陈平.一种基于STRIDE威胁模型的风险评估方法[J].信息安全与通信保密,2009,31(10):47-49. 被引量：11
8黄磊.基于STRIDE威胁模型的潜在威胁分析及对策探究——以网站群管理平台为例[J].莆田学院学报,2016,23(5):58-61. 被引量：2
9林碧,谢明红.求解Job Shop调度问题的自适应遗传算法设计[J].佳木斯大学学报（自然科学版）,2008,26(4):530-534.
10钟庆.基于喜好标签的移动互联网用户行为分类研究[J].移动通信,2016,40(9):93-96. 被引量：3

计算机研究与发展

2007年第2期

浏览历史

内容加载中请稍等...

非线性规律访存操作的数据预取技术被引量：1

参考文献8

同被引文献25

引证文献1

二级引证文献4

相关作者

相关机构

相关主题

浏览历史

非线性规律访存操作的数据预取技术 被引量：1

参考文献8

同被引文献25

引证文献1

二级引证文献4

相关作者

相关机构

相关主题

浏览历史

非线性规律访存操作的数据预取技术被引量：1