80位浮点运算的编译实现与优化被引量：1

Implementation and Optimization of the 80-Bit Floating-Point Arithmetic

下载PDF

导出

摘要本文介绍了80位浮点运算的编译实现技术,在IA-64平台上针对一套科学计算测试程序进行了性能瓶颈分析。利用IA-64体系结构特点,改进和实现了用户定义函数的自动内联、高级循环变换、数据预取、80位浮点数学库函数内联扩展四种编译优化。测试结果表明,这些优化手段显著提高了80位浮点运算的串行性能和并行性能。 In this paper we present the implementation of the 80-bit floating-point arithmetic, and perform a bottleneck analysis of the IA-64 system with a suite of scientific computing benchmarks. Then we improve and implement four optimizations by utilizing the architecture features of IA-64, namely the automatic inlining of user-defined functions, high-level loop transformations, data prefetching, and expanding of math libraries, We show that on IA-64 these improvements have significantly improved the perform- ance of both serial and parallel 80-bit floating-point arithmetic of the scientific computing benchmarks.

作者杨灿群杨学军易会战李春江

机构地区国防科技大学计算机学院

出处《计算机工程与科学》 CSCD 北大核心 2009年第1期154-158,共5页 Computer Engineering & Science

基金国家自然科学基金重点资助项目(60633050)

关键词 80位浮点运算 IA-64体系结构 GCC编译器优化 80-bit floating-point arithmetic IA-64 architecture GCC corapiler optimization

分类号 TP314 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献15

1周毓麟,袁国兴.关于科学计算用数字电子计算机字长问题[J].计算机工程与科学,2005,27(10):1-2. 被引量：7
2Bailey D H. High-Precision Floating-Point Arithmetic in Scientitle Computation[J]. Computing Science and Engineering, 2005,7(3) :54-61.
3Hauser J R. SoftFloat[EB/OL].[2007-07-12]. http://www. jhauser.us/arithmetic/softfloat. html.
4Sehulte M J, Swartzlander E E, A Family of Variable-Precision Interval Arithmetic Processors[J]. IEEE Trans on Computers,2000,49 (5) :387-397.
5Intel Coporation. Intel 64 and Intel IA-32 Architectures Software Developer's Manual[M]. Intel Corporation, 2007.
6Intel Coporation. Intel Itanium^TM Architecture Software Developer's Manual[M]. Intel Corporation, 2001.
7Khan W. Lecture Notes on the Status of IEEE Standard 754 for Binary Floating-Point Arithmetic[R]. University of California Berkeley, 1996.
8Stallman R M. Using and Porting GNU CK2C[M]. Free Software Foundation, Inc, 1994.
9Jarp S. A Methodology for Using the Itanium 2 Performance Counters for Bottleneck Analysis [ EB/OL].[2007-03-25]. http://www.gelato.org/pdUPerformance_ counters_ final. pdf.
10Arnold M, Fink S, Sarkar V, et al. A Comparative Study of Static and Profile-Based Heuristics for Inlining[C]//Proc of ACM SIGPLAN Workshop on Dynamic and Adaptive Compilation and Optimization, 2000 : 52-64.

二级参考文献2

1周毓麟.科学计算用数字电子计算机的若干问题[J].数学进展,1989,18(4):433-438. 被引量：1
2周毓麟.关于科学计算用数字电子计算机的字长与速度、内存的匹配关系的讨论[J].数值计算与计算机应用,1980,1:181-192.

共引文献6

1雷元武,窦勇,郭松,李鑫,雷国庆.基于高精度乘累加的LU分解加速器的设计[J].计算机工程与科学,2009,31(11):33-36. 被引量：2
2张晓霞,郝一正,邵京云,袁国兴.高分辨率数值计算研究[J].计算机工程与科学,2011,33(6):102-107.
3赵高义,郑启龙.BWDSP104X字节寻址模式扩展及64位数据运算模拟实现[J].计算机工程,2016,42(8):14-18. 被引量：1
4马旭.超高精度计算程序设计实例[J].计算机工程与应用,2017,53(14):51-55. 被引量：2
5李超,焦义文,傅诗媛,高泽夫,毛飞龙.基于GPU的数字下变频累积误差控制方法[J].系统工程与电子技术,2023,45(4):965-972. 被引量：1
6宇波,禹国军,王艺,李敬法,陈宇杰,孙东亮.CFD/NHT教学中若干易混淆概念的综合辨析教学方法[J].力学与实践,2023,45(4):920-927.

同被引文献14

1王国栋,侯朝焕.GCC在高性能微处理器DSP和CPU上的移植[J].计算机工程与设计,2005,26(4):891-892. 被引量：3
2周毓麟,袁国兴.关于科学计算用数字电子计算机字长问题[J].计算机工程与科学,2005,27(10):1-2. 被引量：7
3胡定磊,陈书明,刘春林.奇异数据类型的编译支持[J].计算机工程,2007,33(3):29-31. 被引量：1
4Fisher J A,Faraboschi P,Young C.嵌入式计算:体系结构编译器和工具的VLIW方法(英文版)[M].北京:机械工业出版社,2006.
5Bailey D H.High-precision Floating-point Arithmetic in Scientific Computation[J].Computing in Science&Engineering,2005,7(3):54-61.
6Pillai R V K,Al-Khalili D.A Low Power Approach to Floating Point Adder Design for DSP Applications[J].The Journal of VLSI Signal Processing,2001,27(3):195-213.
7Debyo S,Vincent B,Yang Fan,et al.Concept and Development of Modular VLIW Processor Based on FPGA[C]//Proceedings of International Conference on Computer and Network Technology.Washington D.C.,USA:IEEE Press,2010:291-300.
8Panainte E M,Bertels K,Vassiliadis S.Interprocedural Compiler Optimization for Partial Run-time Reconfiguration[J].The Journal of VLSI Signal Processing,2006,43(2/3):161-172.
9任小西,张克环,李仁发.基于FPGA的一种存储器字节访问方法[J].计算机应用,2008,28(6):1605-1607. 被引量：3
10邓晴莺,张民选,蒋江.IA-64的并行架构及其寄存器文件[J].计算机工程,2008,34(12):13-15. 被引量：1

引证文献1

1赵高义,郑启龙.BWDSP104X字节寻址模式扩展及64位数据运算模拟实现[J].计算机工程,2016,42(8):14-18. 被引量：1

二级引证文献1

1廖晓群,王佳仪,苏涛,李敏,张美春.HXDSP上双精度矩阵向量乘运算的实现与优化[J].计算机技术与发展,2021,31(11):101-107.

1陈鸣春,潘金贵.一个基于IA-64体系的内存管理大页面的实现模型[J].计算机科学,2007,34(4):276-278.
2马卓杰,卢洪虎,张勇.Intel的64位体系结构[J].信息工程大学学报,2003,4(4):59-62. 被引量：1
3张靖博,赵荣彩,苏铭,张素青.IA-64过程调用的逆向恢复技术[J].微计算机信息,2005,21(4):218-219.
4谢若承,程捷.改进DataWindow的打印技术[J].中国计量学院学报,1998,9(1):51-55. 被引量：1
5掌胜国.用Visual Basic创建Excel用户定义函数[J].少年电世界,2002(3):14-16.
6杨灿群,王锋,彭林,杨学军.用表驱动算法在GCC中优化实现指数函数[J].计算机工程与科学,2007,29(5):77-80. 被引量：1
7汪淼,赵荣彩,蔡国明.IA-64二进制翻译中软件流水代码消除技术[J].计算机工程,2008,34(16):44-46.
8惠普IA-64体系工作站技术剖析之一——安腾2处理器技术特征分析[J].CAD/CAM与制造业信息化,2003(1):75-76.
932位微控制器可使实时控制系统性能提升5倍[J].今日电子,2012(6):67-67.
10冯华,迟万庆,刘勇鹏.IA-64平台可扩展固件接口设计与开发[J].计算机应用与软件,2011,28(1):167-169.

计算机工程与科学

2009年第1期

浏览历史

内容加载中请稍等...

80位浮点运算的编译实现与优化被引量：1

参考文献15

二级参考文献2

共引文献6

同被引文献14

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

80位浮点运算的编译实现与优化 被引量：1

参考文献15

二级参考文献2

共引文献6

同被引文献14

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

80位浮点运算的编译实现与优化被引量：1