数据融合优化在IA-64机器上的性能可移植性测试和分析

The Performance Evaluation and Analysis of Data Fusion Optimization on IA-64 Computer

下载PDF

导出

摘要文章[1]中提出了数组之间的数据融合优化方法,并以IA-32服务器为平台测试了数据融合优化的效果。测试结果表明,在IA-32机器上,数据融合优化在性能代价模型的控制下,能较好地改善具有非连续数据访问特征的应用程序的CACHE利用率。那么,在新一代体系结构IA-64平台上,数据融合优化的效果如何呢?该文分别以IntelIA-32服务器和HPITANIUM服务器为平台,用IntelFORTRAN编译器ifc和efc及自由软件编译器g95分别编译并运行数据融合优化变换前后的程序,获得两种平台上的执行时间及相关的性能数据。测试结果表明,源程序级的数据融合优化不能很好地与IA-64平台上的EFC编译器高级优化配合工作,在O3级优化开关控制下,优化效果是负值。此测试结果进一步表明,编译高级优化如数据预取、循环变换和数据变换等各种优化必须结合体系结构的特点统筹考虑,才能取得好的全局优化效果。该文为研究各种面向IA-32体系结构的编译优化算法在IA-64体系结构上的性能可移植性优化起到抛砖引玉的作用。 Data fusion based approach is presented to improve data locality in paper1.The evaluation results under the control of certain performance-cost model show that data fusion can improve the performance of applications with non-continuous data access pattern on IA-32 computers.However,what about it on IA-64 computers﹖This paper uses Intel ifc(IA-32) compiler,Intel efc(IA-64) compiler and g95(GNU Fortran 95) compiler to compile the original source code and the optimized source code,and runs the executable files on Intel IA-32 computer and IA-64 computer respectively.The results show that data fusion optimization can not cooperate effectively with the high level optimizations such as loop transformation of efc compiler on IA-64 computer.When the testing program is compiled with efc-O3,the execute time of the optimized program is on the contrary longer than that of the non-optimized program.The results also show that the high compiler optimizations such as data prefetch,loop transformation and data transformation must be considered synthetically,integrating with the characteristic of the underlying IA-64 micro-architecture.

作者曾丽芳杨学军

机构地区国防科技大学计算机学院

出处《计算机工程与应用》 CSCD 北大核心 2005年第15期1-4,16,共5页 Computer Engineering and Applications

基金国家863高技术研究发展计划基金(编号:2002AA1Z2101 2004AA1Z2210)

关键词数据融合局部性循环变换数据预取 IA-32 IA-64 data fusion,locality,loop transformation,data prefetch,IA-32,IA-64

分类号 TP31 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献10

1曾丽芳,杨学军,夏军,陈娟.一种利用数据融合来提高局部性和减少伪共享的方法[J].计算机学报,2004,27(1):32-41. 被引量：5
2David Parello,Olivier Temam,Jean-Marie Verdun.On Increasing Architecture Awareness in Program Optimizations to Bridge the Gap between Peak and Sustained Processor Performance-Matrix-Multiply Revisited.Supercomputing, Baltimore,USA,2002.
3M Wolf, M Lama Data Locality Optimizing Algorithrn[C].In:Proc ACM SIGPLAN Cord Prog Lang Des & Impl,1991:30-44.
4K S Mckinley,S Carr,C W Tseng.Improving Data Locality with Loop Transformations[J].ACM Transactions on programming languages and systems,1996;18(4):421-453.
5Y Song,Z li.New Tiling Techniques to Improve Cache Temporal Locality[C].In:Proceedings of the SIGPLAN'99 Conference on Progrannning Languages Design and Implementation,Atlanta,GA,1999-05.
6Naraig Manjikian,Tarek S Abdelrahman.Fusion of Loops for Parallelism and Locality[J].IEEE Transactions on Parallel and Distributed System,1997;8(2):193-209.
7M Wolf.High Performance Compilers for Parallel Computing[M].Redwood City:Addison-Wesley Publishing Company,1996.
8M Kandemir,A Choudhary,N Shenoy et al.A Hyperplane Based Approach for Optimizing Spatial Locality in Loop Nests[C].In:Proc 1998 ACM International Conference on Super-computing(ICS'98),1998.
9M O'Boyle,P Knijnenburg.Non-Singular Data Transformation:Definition,Validity,Applications[C].In:Proc 6th Workshop on Compilers for Parallel Computers,Aachen Germany,1996:287-297.
10U Kremer.Automatic Data Layout for Distributed Memory Machines[D]. PhD thesis.Dept of Computer Science, Rice University, 1995-10.

二级参考文献14

1Carr S. , Mckinley K. S. , Tseng Chau-Wen. Compiler optimizations for improving data locality. In: Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, California,1994, 252～262
2Wolf M. , Lam M.. A data locality optimizing algorithm. In:Proceedings of the ACM conference on Programming Language Design and Implementation,Toronto, Canada, 1991, 30～44
3Kandemir Mahmut, Ramanujam J.. Data relation vectors: A new abstraction for data optimizations. In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques(PACT), Philadelphia, PA, 2001, 798～810
4Ding C. , Kennedy K.. Improving cache performance in dynamic applications through data and computation reorganization at run time. In: Proceedings of the SIGPLAN' 99 Conference on Programming Language Design and Implementation, Atlanta, G-A, 1999, 229～241
5Wolf M.. High Performance Compilers for Parallel Computing. Redwood City, California: Addison-Wesley Publishing Company, 1996
6Mckinley K. S. ,Carr S. , Tseng C. W.. Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems, 1996, 18(4): 421～453
7Song Y. , Li Z.. New tiling techniques to improve cache temporal locality. In: Proceedings of the SIGPLAN'99 Conference on Programming Languages Design and Implementation, Atlanta, GA, 1999, 215～228
8Manjikian Naraig, Abdelrahman T. S.. Fusion of loops for parallelism and locality. IEEE Transactions on Parallel and Distributed System, 1997, 8(2): 193～209
9Kandemir M. , Choudhary A. , Shenoy N. , Banerjee P. , Ramanujam J.. A hyperplane based approach for optimizing spatial locality in loop nests. In: Proceedings of 1998 ACM International Conference on Super-computing (ICS'98) ,Melbourne,Australia, 1998, 69～76
10Leung Shun-Tak A.. Array restructuring for cache locality.Department of Computer Science and Engineering, University of Washington, Washington: Technical Report UW-CSE-96-08-01,1996

共引文献4

1付立东,赵永刚,邓福岐.二维非线性对流扩散方程求解程序优化[J].西安科技大学学报,2009,29(1):104-108.
2杨旭,陈勇.JAVA多线程实现矩阵乘法的并行计算[J].烟台职业学院学报,2004,10(3):44-47. 被引量：1
3赵永刚,付立东,邓福岐.二维非线性对流扩散方程求解程序的测试与优化[J].计算机技术与发展,2009,19(7):137-140.
4吴俊杰,杨学军,刘光辉,唐玉华.面向OpenMP和OpenTM应用的并行数据重用理论[J].软件学报,2010,21(12):3011-3028. 被引量：3

1陈东,徐奔.Kalman滤波融合优化Mean Shift的目标跟踪分析及研究[J].电子世界,2016,0(9):130-130.
2郭德宏.智能光网络融合优化[J].河南科技,2014,33(1):73-74.
3王波,张菁,杜晓昕.基于逐级变异布谷鸟搜索和Powell的医学图像配准[J].电子技术应用,2015,41(8):135-137. 被引量：1
4总线与总线结构[J].电子科技文摘,2002,0(12):128-129.
5惠普推新型Itanium和Unix服务器系列[J].现代信息技术,2004(4):84-84.
6马卓杰,卢洪虎,张勇.Intel的64位体系结构[J].信息工程大学学报,2003,4(4):59-62. 被引量：1
7任克强,高晓林,谢斌.基于AFSA和PSO融合优化的AdaBoost人脸检测算法[J].小型微型计算机系统,2016,37(4):861-865. 被引量：13
8韩涛,邹强,吴衡,张虎龙,侯海啸.Kalman滤波融合优化Mean Shift的目标跟踪[J].硅谷,2014,7(6):32-33. 被引量：1
9张纪伟.“Itanium”服务器终面世高端市场变数生[J].互联网周刊,2001(18):34-34.
10郭大忠,柳洪义.冗余度机器人运动学性能优化的研究[J].机械与电子,2008,26(2):61-63. 被引量：1

计算机工程与应用

2005年第15期

浏览历史

内容加载中请稍等...

数据融合优化在IA-64机器上的性能可移植性测试和分析

参考文献10

二级参考文献14

共引文献4

相关作者

相关机构

相关主题

浏览历史