基于GPU的隐式算法与方案研究

The Research of the Implicit Algorithm and Program Based on the GPU

导出

摘要图形处理单元（GPU）可以将桌面计算机的计算速度提高1～2个数量级，发展相关的隐式算法非常重要。本研究根据GPU的硬件特点，选择了DP—LUR隐式方法，并对此进行了进一步的改进。根据GPU算法低内存需求，首先对DP—LUR方法右端项相关矩阵运算进行改写，将其变化为具有推广价值的标量形式。这一改进与原始方法完全等价，但形式极为简洁，节省了大量的内存存储与读写需求。随后，进一步将左端项矩阵对角化，从而对内存存储与读写的需求进一步降低，同时降低了单步迭代计算量，但也同时降低了收敛速度，总计算量比前一种方法增加了约20％。以上两种改进相互独立，可以根据需要单独或联合选取。 The computation speed of desktop computer can be accelerated 10~100 times by Graphic Processing Unit （GPU）. Therefore, it is very important to develop corresponding implicit algorithm. According to the hardware characteristic of GPU, the DP-LUR implicit method is chosen and improved. According to the requirement of low memory, the matrix operation of the right term is rewritten as scalar form which can be extended. This improvement does not change the nature of origin method, but has very concise form and save many memory storage, read and written requirement. Further, the left term is changed diagonally. It decreases the memory storage, read and written requirement further. It also decreases the computation time for the one iteration, however, the totally computation time increases about 20%, because the convergence speed is decreased simultaneously. Above two improvements are independent of each other, and can be individually or jointly adopted according to requirement.

作者李雪松顾春伟

机构地区清华大学热能工程系

出处《工程热物理学报》 EI CAS CSCD 北大核心 2013年第11期2043-2047,共5页 Journal of Engineering Thermophysics

基金国家自然科学基金资助项目(No.51276092)

关键词 GPU 隐式算法 DP—LUR方法低内存需求算法 GPU implicit algorithm DP-LUR method algorithm of low-memory requirement

分类号 O354 [理学—流体力学]

引文网络
相关文献

参考文献3

1董廷星,李新亮,李森,迟学斌.GPU上计算流体力学的加速[J].计算机系统应用,2011,20(1):104-109. 被引量：13
2李雪松,徐建中.一种非定常N-S方程并行求解设计[J].工程热物理学报,2008,29(1):52-54. 被引量：4
3张兵,韩景龙.基于GPU和隐式格式的CFD并行计算方法[J].航空学报,2010,31(2):249-256. 被引量：9

二级参考文献51

1柳有权,刘学慧,吴恩华.基于GPU带有复杂边界的三维实时流体模拟[J].软件学报,2006,17(3):568-576. 被引量：54
2Bergman C M, Vos J B. Parallelization of CFD codes[J]. Computer Methods in Applied Mechanics and Engineering, 1991, 89(1- 3): 523- 528.
3Buck I, Foley T, Horn D, et al. Brook for GPUs: stream computing on graphics hardware [ C] // ACM SigGraph 2004 Papers. International Conference on Computer Graphics and Interactive Techniques. New York: ACM, 2004: 777- 786.
4Kruger J, Westermann R. Linear algebra operators for GPU implementation of numerical algorithms[C]//ACM SigGraph 2005 Courses. International Conference on Coin purer Graphics and Interactive Techniques. New York: ACM, 2005:908 -916.
5Rumpf M, Strzodka R. Nonlinear diffusion in graphics hardware[C]//Ebert D, Favre J M, Peikert R. Data Visualization 2001. New York, Springer, 2001:75- 84.
6Bolz J, Farmer I, Grinspun E, et al. Sparse matrix solvers on the GPU: coniugate gradients and multigrid[C]// ACM SigGraph 2003 Papers. International Conference on Computer Graphics and Interactive Techniques. New York: ACM, 2003:917- 924.
7Goodnight N, Woolley C, Lewin G, et al. A multigrid solver for boundary value problems using programmable graphics hardware [C]//SigGraph/Eurographics Conference on Graphics Hardware. Proceedings of the ACM SigGraph/Eurographics Conference on Graphics Hardware. Airela- Ville, Switzerland: Eurographics Association, 2003:102- 111.
8Fatica M, Jameson A, Alonso J. SFLO: an Euler solver for streaming architectures[R]. AIAA- 2004-1090, 2004.
9Brandvik T, Pullan G. Acceleration of a two dimensional Euler flow solver using commodity graphics hardware[J]. Journal of Mechanical Engineering Science, 2007, 221 (12): 1745-1748.
10Brandvik T, Pullan G. Acceleration of a 3D Euler solver using commodity graphics hardware[R]. AIAA- 2008-607, 2008.

共引文献21

1李雪松,顾春伟.有大分离的压气机高压级静叶分离涡模拟研究[J].工程热物理学报,2009,30(1):31-34. 被引量：3
2奉凡,顾春伟,李雪松.间断Galerkin算法求解三维RANS方程[J].清华大学学报（自然科学版）,2010,50(11):1834-1837. 被引量：1
3蔡红明,昂海松,段文博.一种新型涵道飞行器的设计与气动特性研究[J].兵工学报,2012,33(7):857-863. 被引量：11
4张翔,黄秀全.基于图形处理器加速的叶轮机流场数值模拟研究[J].科学技术与工程,2013,21(11):3195-3199. 被引量：3
5尚月强,何银年.不可压缩流动的并行数值方法[J].中国科学：数学,2013,43(6):577-590. 被引量：5
6邓亮,徐传福,刘巍,张理论.交替方向隐式CFD解法器的GPU并行计算及其优化[J].计算机应用,2013,33(10):2783-2786. 被引量：2
7王惠,郭培卿,陈小龙.ANSYS和Abaqus软件GPU加速性能典型算例测试与分析[J].计算机工程与科学,2013,35(11):105-110. 被引量：3
8王磊,卢显良,陈明燕,张伟,张顺生.基于多核计算的雷达并行仿真结构[J].电子科技大学学报,2014,43(1):113-118. 被引量：3
9赵海波,徐祖伟,刘昕,史家伟,郑楚光.颗粒凝并动力学MonteCarlo方法的高效GPU并行计算[J].科学通报,2014,59(14):1358-1368. 被引量：3
10鞠鹏飞,宁方飞.GPU平台上的叶轮机械CFD加速计算[J].航空动力学报,2014,29(5):1154-1162. 被引量：4

1Jianming Xia,Demin Wei.GPU Accelerated Computation for Natural Frequencies of Structures[J].通讯和计算机（中英文版）,2010,7(6):10-13. 被引量：1
2丁鹤平,朱竹青,孙敏,王晓雷,周延怀.基于图形处理单元的数字全息图加速再现算法研究[J].中国激光,2010,37(11):2901-2905. 被引量：1
3任胜寒,陈雪利,曹旭,朱守平,梁继民.GPU accelerated simplified harmonic spherical approximation equations for three-dimensional optical imaging[J].Chinese Optics Letters,2016,14(7):80-84.
4赵春阳,牛振波.几种具有高阶收敛速度的改进牛顿迭代法[J].科技创新与应用,2013,3(27):284-285. 被引量：2
5刘强,谢伟,邱辽原,解学参.桌面计算机上利用格子Boltzmann方法的GPU计算[J].上海交通大学学报,2014,48(9):1329-1333. 被引量：3
6程莉娜,富笑男,罗艳伟,王信春.数字技术在纳米复合材料研究中的应用[J].数字技术与应用,2011,29(4):82-82.
7梁枢平,陈文.考虑轴力的简支梁非线性静力问题的DQ解[J].华中理工大学学报,1998,26(2):65-67. 被引量：5
8王淮生,郭利辉.运用硬件提高计算全息运算速度综述[J].上海电力学院学报,2010,26(3):265-271.
9张智露,蔡冬梅,贾鹏,韦宏艳.基于功率谱的高精度大气湍流相位屏的快速模拟[J].激光与光电子学进展,2017,54(2):73-81. 被引量：2
10高岩涛,贾伟乐,王龙,汪林望.超软赝势密度泛函分子动力学计算中的若干优化算法[J].科研信息化技术与应用,2015,6(4):47-53. 被引量：3

工程热物理学报

2013年第11期

浏览历史

内容加载中请稍等...

基于GPU的隐式算法与方案研究

参考文献3

二级参考文献51

共引文献21

相关作者

相关机构

相关主题

浏览历史