期刊文献+

面向异构架构的混合精度有限元算法及其CUDA实现 被引量:1

Mixed Precision Finite Element Algorithm on Heterogeneous Architecture
下载PDF
导出
摘要 长期以来,单精度似乎与科学计算无缘,然而从体系结构看,混合精度计算可以充分发挥向量部件、GPGPU设备的单精度性能,提供更高的效能,如降低通讯带宽要求、提高数据传输和通讯效率等。混合精度显格式有限元算法,结合材料强非线性多尺度有限元程序msFEM,实现了GPGPU上的有效加速。实验结果表明:混合精度显格式有限元程序实现了90%以上的计算通过单精度完成,其计算结果与全部使用双精度的结果相一致。该算法可以使得在不支持双精度格式的加速卡上实现科学计算功能。在支持双精度浮点格式的GPU上,混合精度算法与全部采用双精度计算相比其加速效果提高了1.6~1.7倍。 For a long time,single precision has been giving away to double precision in scientific computing.However,on computer architectures,mixed-precision computing,can take full advantages of excellent computing compatibilities of vector components,GPGPU,offering merits such as reducing communication bandwidth requirements,improving data movement efficiency etc.A mixed-precision explicit finite-element algorithm was proposed and implemented on nVidia GPU for strongly nonlinear multi-scale material simulation.The developed mixed-precision finite-element method gives the same results as that of the fully double-precision calculation,while keeping a 90% portion of finite element calculations to be done by single precision float calculation.As a result,on the device that does not support native double precision float format,the mixed-precision algorithm makes it possible to fulfill double precision finite element simulation,while on the device that supports the native double precision,the mixed-precision algorithm is 1.6~1.7 times faster than the full double precision calculation.
出处 《计算机科学》 CSCD 北大核心 2012年第6期293-296,共4页 Computer Science
基金 国家自然科学基金(11072241)资助
关键词 GPGPU 混合精度算法 有限元 并行计算 GPGPU Mixed precision algorithm Finite element method Parallel computing
  • 相关文献

参考文献11

  • 1Strzodka R, GOddeke D. Mixed precision methods for convergent iterative schemesI-C///Proceedings of the 2006 Workshop on Edge Computing Using New Commodity Architectures. May 2006 : 59-60.
  • 2Langou Ju-lie, Langou Ju-lien, Luszczek P, et al. Exploiting the performance of 32 bit floating point arithmetic in obtaining 64bit accuracy (revisiting iterative refinement for linear systems) [-C]//Proceedings of the 2006 ACM/IEEE conference on Super- computing. 2006.
  • 3Kurzak J, Dongarra J J. Implementation of mixed precision in sol- ving systems of linear equations on the CELL proeessorEM. Concurrency Computat. Pratt. Exper. to appear.
  • 4G6ddeke D, Wobker H, Strzodka R, et al. Co-processor accelera- tion of an unmodified parallel solid mechanics code with FEASTGPU[-M. Accepted for publication in the International Journal of Computational Science and Engineering, 2008.
  • 5G6cldeke D,Strzodka R. Performance and accuracy of hardware- oriented native-, emulated- and mixed-precision solvers in FEM simulations (part 2:Double precision GPUs)ER. Technical U- niversity Dortmund, 2008.
  • 6G6ddeke D, Strzodka R, Turek S. Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision sol- vers in FEM simulations[J]. International Journal of Parallel, E- mergent and Distributed Systems, Special Issue: Applied Paral- lel Computing, 2007,22(4) : 221-256.
  • 7Strzodka R,G6ddeke D. Pipelined mixed precision algorithms on FPGAs for fast and accurate PDE solvers from low precision componentsl-C//Proceedings of the 14th Annual IEEE Sympo- sium on Field-Programmable Custom Computing Machines (FC- CM'06). 2006 : 259-270.
  • 8Li X S, Demmel J W, Bailey D H, et al. Design, implementation and testing of extended and mixed precision BLAS[-J. ACM Transactions on Mathematical Software (TOMS) ,2002,28(2).
  • 9Cecka C, Lew A J, Darve E. Assembly of Finite Element Meth- ods on Graphics Processors[M]. Int. J. Numer. Meth. Engng. , 2000.. 1-6.
  • 10Kurzak J, Dongarra J. Implementation of mixed precision in sol- ving systems of linear equations on the Cell processor[J]. Con- currency and Computation: Practice and Experience, 2007, 19 (10):1371-1385.

同被引文献9

  • 1王恩东,张清,沈铂,等.MIC高性能计算编程指南[M].北京:中国水利水电出版社,2012.
  • 2Luo Li,Yang Chao,Zhao Yu-bo, et al. A scalable hybrid algo-rithm based on domain decomposition and algebraic multigrid forsolving partial differential equations on a cluster of CPU/GPUs[J/OL]. http://www. cs. Colorado. edu/~cai/papers/lyzc2011. pdf.
  • 3Chan K,Zhang Ke-ke. Li Li-gang, et al. A new generation ofconvection-driven spherical dynamos using EBE finite elementmethod[J]. Physics of the Earth and Planetary Interiors,2007,163(1-4):251-265.
  • 4Kong Da-li, Zhang Ke-ke,Gerald S, et al. A three-dimensionalnumerical solution for the shape of a rotationally distortedpolytrope of index unity [J]. The Astrophysical Journal,2013,763(2):116-126.
  • 5Kong Da-li. Analytical and Numerical Studies of Several Fluid Mechanical Problems[D]. University of Exeter,2012.
  • 6Kong Da-li,Zhang Ke-ke,Gerald S. Shapes of two-layer modelsof rotating planets[J]. Journal of Geophysical Research, 2010,115(E12003):1-11.
  • 7王迎瑞,任江勇,田荣.基于GPU的高性能稀疏矩阵向量乘及CG求解器优化[J].计算机科学,2013,40(3):46-49. 被引量:7
  • 8刘跃进,薛孟君.LDLT分块求解计算方法在有限元分析中的编程实现[J].计算机科学,2014,41(B11):408-409. 被引量:2
  • 9沈铂,张广勇,吴韶华,卢晓伟,张清.基于MIC平台的offload并行方法研究[J].计算机科学,2014,41(S1):477-480. 被引量:5

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部