期刊文献+

数据融合优化在IA-64机器上的性能可移植性测试和分析

The Performance Evaluation and Analysis of Data Fusion Optimization on IA-64 Computer
下载PDF
导出
摘要 文章[1]中提出了数组之间的数据融合优化方法,并以IA-32服务器为平台测试了数据融合优化的效果。测试结果表明,在IA-32机器上,数据融合优化在性能代价模型的控制下,能较好地改善具有非连续数据访问特征的应用程序的CACHE利用率。那么,在新一代体系结构IA-64平台上,数据融合优化的效果如何呢?该文分别以IntelIA-32服务器和HPITANIUM服务器为平台,用IntelFORTRAN编译器ifc和efc及自由软件编译器g95分别编译并运行数据融合优化变换前后的程序,获得两种平台上的执行时间及相关的性能数据。测试结果表明,源程序级的数据融合优化不能很好地与IA-64平台上的EFC编译器高级优化配合工作,在O3级优化开关控制下,优化效果是负值。此测试结果进一步表明,编译高级优化如数据预取、循环变换和数据变换等各种优化必须结合体系结构的特点统筹考虑,才能取得好的全局优化效果。该文为研究各种面向IA-32体系结构的编译优化算法在IA-64体系结构上的性能可移植性优化起到抛砖引玉的作用。 Data fusion based approach is presented to improve data locality in paper1.The evaluation results under the control of certain performance-cost model show that data fusion can improve the performance of applications with non-continuous data access pattern on IA-32 computers.However,what about it on IA-64 computers﹖This paper uses Intel ifc(IA-32) compiler,Intel efc(IA-64) compiler and g95(GNU Fortran 95) compiler to compile the original source code and the optimized source code,and runs the executable files on Intel IA-32 computer and IA-64 computer respectively.The results show that data fusion optimization can not cooperate effectively with the high level optimizations such as loop transformation of efc compiler on IA-64 computer.When the testing program is compiled with efc-O3,the execute time of the optimized program is on the contrary longer than that of the non-optimized program.The results also show that the high compiler optimizations such as data prefetch,loop transformation and data transformation must be considered synthetically,integrating with the characteristic of the underlying IA-64 micro-architecture.
出处 《计算机工程与应用》 CSCD 北大核心 2005年第15期1-4,16,共5页 Computer Engineering and Applications
基金 国家863高技术研究发展计划基金(编号:2002AA1Z2101 2004AA1Z2210)
关键词 数据融合 局部性 循环变换 数据预取 IA-32 IA-64 data fusion,locality,loop transformation,data prefetch,IA-32,IA-64
  • 相关文献

参考文献10

  • 1曾丽芳,杨学军,夏军,陈娟.一种利用数据融合来提高局部性和减少伪共享的方法[J].计算机学报,2004,27(1):32-41. 被引量:5
  • 2David Parello,Olivier Temam,Jean-Marie Verdun.On Increasing Architecture Awareness in Program Optimizations to Bridge the Gap between Peak and Sustained Processor Performance-Matrix-Multiply Revisited.Supercomputing, Baltimore,USA,2002.
  • 3M Wolf, M Lama Data Locality Optimizing Algorithrn[C].In:Proc ACM SIGPLAN Cord Prog Lang Des & Impl,1991:30-44.
  • 4K S Mckinley,S Carr,C W Tseng.Improving Data Locality with Loop Transformations[J].ACM Transactions on programming languages and systems,1996;18(4):421-453.
  • 5Y Song,Z li.New Tiling Techniques to Improve Cache Temporal Locality[C].In:Proceedings of the SIGPLAN'99 Conference on Progrannning Languages Design and Implementation,Atlanta,GA,1999-05.
  • 6Naraig Manjikian,Tarek S Abdelrahman.Fusion of Loops for Parallelism and Locality[J].IEEE Transactions on Parallel and Distributed System,1997;8(2):193-209.
  • 7M Wolf.High Performance Compilers for Parallel Computing[M].Redwood City:Addison-Wesley Publishing Company,1996.
  • 8M Kandemir,A Choudhary,N Shenoy et al.A Hyperplane Based Approach for Optimizing Spatial Locality in Loop Nests[C].In:Proc 1998 ACM International Conference on Super-computing(ICS'98),1998.
  • 9M O'Boyle,P Knijnenburg.Non-Singular Data Transformation:Definition,Validity,Applications[C].In:Proc 6th Workshop on Compilers for Parallel Computers,Aachen Germany,1996:287-297.
  • 10U Kremer.Automatic Data Layout for Distributed Memory Machines[D]. PhD thesis.Dept of Computer Science, Rice University, 1995-10.

二级参考文献14

  • 1Carr S. , Mckinley K. S. , Tseng Chau-Wen. Compiler optimizations for improving data locality. In: Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, California,1994, 252~262
  • 2Wolf M. , Lam M.. A data locality optimizing algorithm. In:Proceedings of the ACM conference on Programming Language Design and Implementation,Toronto, Canada, 1991, 30~44
  • 3Kandemir Mahmut, Ramanujam J.. Data relation vectors: A new abstraction for data optimizations. In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques(PACT), Philadelphia, PA, 2001, 798~810
  • 4Ding C. , Kennedy K.. Improving cache performance in dynamic applications through data and computation reorganization at run time. In: Proceedings of the SIGPLAN' 99 Conference on Programming Language Design and Implementation, Atlanta, G-A, 1999, 229~241
  • 5Wolf M.. High Performance Compilers for Parallel Computing. Redwood City, California: Addison-Wesley Publishing Company, 1996
  • 6Mckinley K. S. ,Carr S. , Tseng C. W.. Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems, 1996, 18(4): 421~453
  • 7Song Y. , Li Z.. New tiling techniques to improve cache temporal locality. In: Proceedings of the SIGPLAN'99 Conference on Programming Languages Design and Implementation, Atlanta, GA, 1999, 215~228
  • 8Manjikian Naraig, Abdelrahman T. S.. Fusion of loops for parallelism and locality. IEEE Transactions on Parallel and Distributed System, 1997, 8(2): 193~209
  • 9Kandemir M. , Choudhary A. , Shenoy N. , Banerjee P. , Ramanujam J.. A hyperplane based approach for optimizing spatial locality in loop nests. In: Proceedings of 1998 ACM International Conference on Super-computing (ICS'98) ,Melbourne,Australia, 1998, 69~76
  • 10Leung Shun-Tak A.. Array restructuring for cache locality.Department of Computer Science and Engineering, University of Washington, Washington: Technical Report UW-CSE-96-08-01,1996

共引文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部