摘要
本文介绍了80位浮点运算的编译实现技术,在IA-64平台上针对一套科学计算测试程序进行了性能瓶颈分析。利用IA-64体系结构特点,改进和实现了用户定义函数的自动内联、高级循环变换、数据预取、80位浮点数学库函数内联扩展四种编译优化。测试结果表明,这些优化手段显著提高了80位浮点运算的串行性能和并行性能。
In this paper we present the implementation of the 80-bit floating-point arithmetic, and perform a bottleneck analysis of the IA-64 system with a suite of scientific computing benchmarks. Then we improve and implement four optimizations by utilizing the architecture features of IA-64, namely the automatic inlining of user-defined functions, high-level loop transformations, data prefetching, and expanding of math libraries, We show that on IA-64 these improvements have significantly improved the perform- ance of both serial and parallel 80-bit floating-point arithmetic of the scientific computing benchmarks.
出处
《计算机工程与科学》
CSCD
北大核心
2009年第1期154-158,共5页
Computer Engineering & Science
基金
国家自然科学基金重点资助项目(60633050)