面向异构架构的混合精度有限元算法及其CUDA实现被引量：1

Mixed Precision Finite Element Algorithm on Heterogeneous Architecture

下载PDF

导出

摘要长期以来,单精度似乎与科学计算无缘,然而从体系结构看,混合精度计算可以充分发挥向量部件、GPGPU设备的单精度性能,提供更高的效能,如降低通讯带宽要求、提高数据传输和通讯效率等。混合精度显格式有限元算法,结合材料强非线性多尺度有限元程序msFEM,实现了GPGPU上的有效加速。实验结果表明:混合精度显格式有限元程序实现了90%以上的计算通过单精度完成,其计算结果与全部使用双精度的结果相一致。该算法可以使得在不支持双精度格式的加速卡上实现科学计算功能。在支持双精度浮点格式的GPU上,混合精度算法与全部采用双精度计算相比其加速效果提高了1.6～1.7倍。 For a long time,single precision has been giving away to double precision in scientific computing.However,on computer architectures,mixed-precision computing,can take full advantages of excellent computing compatibilities of vector components,GPGPU,offering merits such as reducing communication bandwidth requirements,improving data movement efficiency etc.A mixed-precision explicit finite-element algorithm was proposed and implemented on nVidia GPU for strongly nonlinear multi-scale material simulation.The developed mixed-precision finite-element method gives the same results as that of the fully double-precision calculation,while keeping a 90% portion of finite element calculations to be done by single precision float calculation.As a result,on the device that does not support native double precision float format,the mixed-precision algorithm makes it possible to fulfill double precision finite element simulation,while on the device that supports the native double precision,the mixed-precision algorithm is 1.6～1.7 times faster than the full double precision calculation.

作者刘建华王朝尉任江勇田荣

机构地区中国科学院计算技术研究所高性能计算机研究中心

出处《计算机科学》 CSCD 北大核心 2012年第6期293-296,共4页 Computer Science

基金国家自然科学基金(11072241)资助

关键词 GPGPU 混合精度算法有限元并行计算 GPGPU Mixed precision algorithm Finite element method Parallel computing

分类号 TP391.7 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献11

1Strzodka R, GOddeke D. Mixed precision methods for convergent iterative schemesI-C///Proceedings of the 2006 Workshop on Edge Computing Using New Commodity Architectures. May 2006 : 59-60.
2Langou Ju-lie, Langou Ju-lien, Luszczek P, et al. Exploiting the performance of 32 bit floating point arithmetic in obtaining 64bit accuracy (revisiting iterative refinement for linear systems) [-C]//Proceedings of the 2006 ACM/IEEE conference on Super- computing. 2006.
3Kurzak J, Dongarra J J. Implementation of mixed precision in sol- ving systems of linear equations on the CELL proeessorEM. Concurrency Computat. Pratt. Exper. to appear.
4G6ddeke D, Wobker H, Strzodka R, et al. Co-processor accelera- tion of an unmodified parallel solid mechanics code with FEASTGPU[-M. Accepted for publication in the International Journal of Computational Science and Engineering, 2008.
5G6cldeke D,Strzodka R. Performance and accuracy of hardware- oriented native-, emulated- and mixed-precision solvers in FEM simulations (part 2:Double precision GPUs)ER. Technical U- niversity Dortmund, 2008.
6G6ddeke D, Strzodka R, Turek S. Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision sol- vers in FEM simulations[J]. International Journal of Parallel, E- mergent and Distributed Systems, Special Issue: Applied Paral- lel Computing, 2007,22(4) : 221-256.
7Strzodka R,G6ddeke D. Pipelined mixed precision algorithms on FPGAs for fast and accurate PDE solvers from low precision componentsl-C//Proceedings of the 14th Annual IEEE Sympo- sium on Field-Programmable Custom Computing Machines (FC- CM'06). 2006 : 259-270.
8Li X S, Demmel J W, Bailey D H, et al. Design, implementation and testing of extended and mixed precision BLAS[-J. ACM Transactions on Mathematical Software (TOMS) ,2002,28(2).
9Cecka C, Lew A J, Darve E. Assembly of Finite Element Meth- ods on Graphics Processors[M]. Int. J. Numer. Meth. Engng. , 2000.. 1-6.
10Kurzak J, Dongarra J. Implementation of mixed precision in sol- ving systems of linear equations on the Cell processor[J]. Con- currency and Computation: Practice and Experience, 2007, 19 (10):1371-1385.

同被引文献9

1王恩东,张清,沈铂,等.MIC高性能计算编程指南[M].北京:中国水利水电出版社,2012.
2Luo Li,Yang Chao,Zhao Yu-bo, et al. A scalable hybrid algo-rithm based on domain decomposition and algebraic multigrid forsolving partial differential equations on a cluster of CPU/GPUs[J/OL]. http://www. cs. Colorado. edu/~cai/papers/lyzc2011. pdf.
3Chan K,Zhang Ke-ke. Li Li-gang, et al. A new generation ofconvection-driven spherical dynamos using EBE finite elementmethod[J]. Physics of the Earth and Planetary Interiors,2007,163(1-4):251-265.
4Kong Da-li, Zhang Ke-ke,Gerald S, et al. A three-dimensionalnumerical solution for the shape of a rotationally distortedpolytrope of index unity [J]. The Astrophysical Journal,2013,763(2):116-126.
5Kong Da-li. Analytical and Numerical Studies of Several Fluid Mechanical Problems[D]. University of Exeter,2012.
6Kong Da-li,Zhang Ke-ke,Gerald S. Shapes of two-layer modelsof rotating planets[J]. Journal of Geophysical Research, 2010,115(E12003):1-11.
7王迎瑞,任江勇,田荣.基于GPU的高性能稀疏矩阵向量乘及CG求解器优化[J].计算机科学,2013,40(3):46-49. 被引量：7
8刘跃进,薛孟君.LDLT分块求解计算方法在有限元分析中的编程实现[J].计算机科学,2014,41(B11):408-409. 被引量：2
9沈铂,张广勇,吴韶华,卢晓伟,张清.基于MIC平台的offload并行方法研究[J].计算机科学,2014,41(S1):477-480. 被引量：5

引证文献1

1寇大治,孔大力.有限元网格积分算法在MIC众核平台上的并行实现[J].计算机科学,2015,42(11):56-58.

1王磊,张云泉,刘芳芳,张先轶.基于混合精度算法的改进HPL软件包[J].计算机工程,2010,36(19):47-49. 被引量：2
2刘伟峰,唐先明,李媛媛,曹邦功.使用GPU绘制分形图的混合精度方法研究[J].工程图学学报,2009,30(6):46-52. 被引量：1
3李景泉,于臣.IEEE浮点格式[J].计算机与信息处理标准化,1991(1):24-36.
4胡玉贵.一种基于OPENACC指令加速的均值模糊算法[J].软件导刊,2013,20(1):59-61.
5裴葆青,刘幼立.AutoCAD样条曲线的生成原理及误差分析[J].华北工学院学报,1995,16(2):128-132. 被引量：5
6宋芬.安全电子邮件的相关协议和标准[J].微计算机应用,2006,27(5):546-549. 被引量：4
7蓝鹏,张贵仓.基于Bézier混合的数字图像隐藏[J].计算机安全,2011(3):36-38.
8郑长亮.使用GPG对电子邮件实现加密和签名[J].数字技术与应用,2016,34(11):189-190.
9李杏梅,陈亮,杨敏.结合空间约束的贝叶斯模型遥感图像谱解混合[J].电子技术（上海）,2016,43(11):66-69.
10汪建,方洪鹰.计算数学中浮点格式异常问题的深入剖析[J].重庆交通学院学报,2005,24(2):141-143.

计算机科学

2012年第6期

浏览历史

内容加载中请稍等...

面向异构架构的混合精度有限元算法及其CUDA实现被引量：1

参考文献11

同被引文献9

引证文献1

相关作者

相关机构

相关主题

浏览历史

面向异构架构的混合精度有限元算法及其CUDA实现 被引量：1

参考文献11

同被引文献9

引证文献1

相关作者

相关机构

相关主题

浏览历史

面向异构架构的混合精度有限元算法及其CUDA实现被引量：1