有限元结构分析的层级负载均衡并行计算方法

A hierarchical load balancing parallel computing approach for finite element structural analysis

导出

摘要由于性价比高、计算能力强,多核机群已经成为当今高性能计算的主流工具.然而,多核机群环境下不同的存储机制和通信延迟特点也为高效并行算法的设计带来了挑战.为充分利用多核机群的硬件资源获取最优性能,本文设计了一种有限元结构分析的层级负载均衡并行计算方法.该方法建立在对计算任务的层次性和粒度性充分挖掘的基础上.为与多核机群的硬件拓扑体系结构相适应,本文将计算任务划分为三个层次:节点间并行、片间并行和核间并行.其中,节点间并行和片间并行采用粗粒度并行计算方法,而核间并行采用细粒度并行计算方法.通过将计算任务映射到多核机群的不同硬件层面执行,该方法不仅有效实现了不同层面的负载均衡,而且大幅度降低了系统的通信开销.此外,它还大幅度减少了子区域的数目,有效提高了界面方程的数值收敛性.为验证算法的有效性,在"天河二号"超级计算机上进行了有限元结构线性静力分析大规模并行计算测试.结果表明:同传统区域分解法相比,层级负载均衡并行计算方法能够获得较高的加速比和并行效率.本文的研究主要集中在线性静力学问题上.对于非线性问题或者动力学问题,由于涉及多个迭代步,因此可以将本文算法封装为一个子函数进行调用. Multi-core clusters have become primary tools for high performance computing due to their great computing power and cost-to-performance effectiveness in nowadays. However, it introduces new challenges for the design of efficient parallel algorithms because of the different storage mechanisms and non-uniform communication latencies on these machines. The traditional domain decomposition methods use the direct partition method to achieve load balancing, which directly divides the structure into a number of subdomains with equal according to the number of processing cores involved in parallel computing. As the number of processing cores in a single node of multi-core clusters increases exponentially, the number of subdomains will increase dramatically as well. A substantial increase in the number of subdomains leads to the rapid expansion of the size and the condition number of interface equations, thereby reducing the numerical convergence of the system. In addition, it leads to a considerable increase in the number of processes involved in parallel computing, thereby increasing contention for the limited network ports and bandwidth. The decrease of the numerical convergence and the increase of network communication overheads seriously affect the solution efficiency of interface equations, and greatly reduce the overall parallel efficiency of the domain decomposition method. In order to make full use of the computing power of multi-core clusters to improve the parallel efficiency of large-scale finite element structural analysis, a hierarchical load balancing approach is proposed in the paper. The proposed approach is based on the full mining of computational tasks. In order to adapt to the hardware topology architecture of multi-core clusters, the computational tasks of finite element structural analysis are divided into three layers： inter-node parallelism, inter-chip parallelism and inter-core parallelism. The coarse grain parallel computing method is utilized in inter-node parallelism and inter-chip parallelism, and the fine grain parallel computing method is used in inter-core parallelism. Through mapping computing tasks to different hardware layers of multi-core clusters, the proposed method not only efficiently achieves the load balancing at different layers, but also greatly reduces the communication overheads of the system. Furthermore, it considerably reduces the number of subdomains and significantly improves the numerical convergence of the interface equations. In order to verify the effectiveness of the algorithms, two numerical experiments about finite element structural linear static analysis for large-scale parallel computing were conducted on ＂Tianhe 2＂ supercomputer. For each model, both the traditional domain decomposition method and the proposed hierarchical load balancing approach were employed for numerical simulation utilizing 50, 100, 150, and 200 nodes, respectively. Test results show that the proposed method could obtain higher speedup and parallel efficiency compared with the conventional domain decomposition method. The proposed approach can be widely used for solving many kinds of structural analysis problems including linear static analysis, nonlinear static analysis and nonlinear dynamic analysis and so on. In this paper, the authors＇ current research only focuses on the linear static analysis. For the nonlinear static or dynamic analysis and other kinds of structural analysis, the proposed method can be used as a sub-procedure because the calculations are still dominated by solutions of the same sort of equations.

作者苗新强金先龙丁峻宏

机构地区上海交通大学机械系统与振动国家重点实验室上海交通大学机械与动力工程学院上海超级计算中心

出处《科学通报》 EI CAS CSCD 北大核心 2017年第13期1430-1438,共9页 Chinese Science Bulletin

基金国家高技术研究发展计划(2012AA01A307) 国家自然科学基金(11272214 51475287) 国家重点研发计划(2016YFB0201800)资助

关键词多核机群有限元分析并行计算负载均衡 multi-core cluster, finite element analysis, parallel computing, load balancing

分类号 TP338.6 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献1

1王建炜,金先龙,曹源.列车与结构动态耦合分析的并行计算方法[J].计算力学学报,2012,29(3):352-356. 被引量：8

二级参考文献12

1王祥秋,杨林德,周治国.列车振动荷载作用下隧道衬砌结构动力响应特性分析[J].岩石力学与工程学报,2006,25(7):1337-1342. 被引量：75
2Song M K. A new three dimensional finite element a- nalysis model of high-speed train-bridge interactions [J]. Engineering Structure ,200a ,lS(la) : 1611-1626.
3Kwasniewski L, Li H, Wwkezer J, et al. Finiteele- ment analysis of vehicle-bridge interaction[J]. Finite Elements in Analysis and Design, 2006,42 (11) : 950- 959.
4Liu K, Reynders E, De Roeck G, et al. Experimental and numerical analysis of a composite bridge for high- speed trains[J]. Journal of Sound and Vibration, 2009,320(1-2) : 201-220.
5Yang Y B,Asee F, Hung H H. Soil vibrations caused by underground moving trains [J]. Journal of Geotechnical and Geoenvironmental Engineering, 2008,134(11) : 1633-1644.
6Gupta S,Liu W F,Degrande G,et al. Prediction of vi- brations induced by underground railway traffic in Beijing[J]. Journal of Sound and Vibration, 2008, 310(3):608-630.
7Paik S H, Moon J J, Kim S J, et al. Parallel perform- anee of large scale impact simulations on linux cluster super eomputer[J]. Computers and Structures, 2006, 84(10-11) : 732-741.
8Guo Y Z,Jin X L. Parallel computing for seismic re- sponse analysis of immersed tunnel with domain de- composition[J]. Engineering Computations : Interna- tional Journal for Computer-Aided Engineering and Software,2007,24(2) :182-199.
9李政,金先龙,亓文果.流体-结构耦合问题的有限元并行计算研究[J].计算力学学报,2007,24(6):727-732. 被引量：8
10白冰,李春峰.地铁列车振动作用下近距离平行隧道的弹塑性动力响应[J].岩土力学,2009,30(1):123-128. 被引量：40

共引文献7

1陈令坤,蒋丽忠,余志武.无砟轨道约束对高速铁路列车-桥梁系统地震响应的影响[J].计算力学学报,2013,30(6):763-769. 被引量：7
2杨勋,王欢欢,余克勤,金先龙.行波激励下防波堤地震动力响应分析[J].岩土力学,2014,35(6):1775-1781. 被引量：8
3苗新强,金先龙,丁峻宏.大规模并行结构动力分析分层计算方法[J].计算力学学报,2014,31(6):702-708. 被引量：1
4周星德,林荣庚,李勇直,王现凯,吴利平.高速列车引起的地基振动半解析解[J].计算力学学报,2015,32(1):59-63.
5苗新强,金先龙,丁峻宏.基于稀疏存储的有限元结构分析高效缩聚并行计算方法[J].农业机械学报,2015,46(4):338-343. 被引量：1
6王小庆,金先龙.并行有限元网格生成方法及其应用[J].计算力学学报,2015,32(2):256-261. 被引量：4
7苗新强,金先龙,丁峻宏.结构动力数值仿真两级并行计算系统开发及应用[J].计算机辅助设计与图形学学报,2015,27(6):1126-1133. 被引量：2

1卢小勇,方立.局域网上并行计算方法研究[J].计算技术与自动化,2001,20(3):41-44. 被引量：2
2陈健.配置最优性能的WINDOWS 95[J].多媒体世界,1997(8):17-17.
3杜江,张铮,张杰鑫,邰铭.MapReduce并行编程模型研究综述[J].计算机科学,2015,42(S1):537-541 564. 被引量：24
4陈小辉,文佳,邓杰英.MySQL数据库的权限及其安全缺陷[J].计算机安全,2008(2):82-85. 被引量：3
5王炳晨.11n无线再起波澜华硕11n系列无线新品[J].微电脑世界,2009(5):22-22.
6李永华,徐枋同,徐华中.模糊控制中的非线性问题[J].信息与控制,1992,21(5):316-319. 被引量：7
7天河传说[J].大众软件,2003(15):109-109.
8王玉山,白瑞雪.“天河”离我们有多远[J].科学之友,2009(12):45-45.
9中国“天河二号”成为全球最快超级计算机[J].中亚信息,2013(7):46-46.
10孙艳丰,王众托.并行遗传算法[J].系统工程,1995,13(2):14-16. 被引量：2

科学通报

2017年第13期

浏览历史

内容加载中请稍等...

有限元结构分析的层级负载均衡并行计算方法

参考文献1

二级参考文献12

共引文献7

相关作者

相关机构

相关主题

浏览历史