Intel^(■) Math Kernel Library PARDISO* forIntel^(■) Xeon Phi^(TM) Manycore Coprocessor

下载PDF

导出

摘要 The paper describes an efficient direct method to solve an equation Ax = b, where A is a sparse matrix, on the Intel®Xeon PhiTM coprocessor. The main challenge for such a system is how to engage all available threads (about 240) and how to reduce OpenMP* synchronization overhead, which is very expensive for hundreds of threads. The method consists of decomposing A into a product of lower-triangular, diagonal, and upper triangular matrices followed by solves of the resulting three subsystems. The main idea is based on the hybrid parallel algorithm used in the Intel®Math Kernel Library Parallel Direct Sparse Solver for Clusters [1]. Our implementation exploits a static scheduling algorithm during the factorization step to reduce OpenMP synchronization overhead. To effectively engage all available threads, a three-level approach of parallelization is used. Furthermore, we demonstrate that our implementation can perform up to 100 times better on factorization step and up to 65 times better in terms of overall performance on the 240 threads of the Intel®Xeon PhiTM coprocessor. The paper describes an efficient direct method to solve an equation Ax = b, where A is a sparse matrix, on the Intel®Xeon PhiTM coprocessor. The main challenge for such a system is how to engage all available threads (about 240) and how to reduce OpenMP* synchronization overhead, which is very expensive for hundreds of threads. The method consists of decomposing A into a product of lower-triangular, diagonal, and upper triangular matrices followed by solves of the resulting three subsystems. The main idea is based on the hybrid parallel algorithm used in the Intel®Math Kernel Library Parallel Direct Sparse Solver for Clusters [1]. Our implementation exploits a static scheduling algorithm during the factorization step to reduce OpenMP synchronization overhead. To effectively engage all available threads, a three-level approach of parallelization is used. Furthermore, we demonstrate that our implementation can perform up to 100 times better on factorization step and up to 65 times better in terms of overall performance on the 240 threads of the Intel®Xeon PhiTM coprocessor.

作者 Alexander Kalinkin Anton Anders Roman Anders

机构地区 Intel Corporation

出处《Applied Mathematics》 2015年第8期1276-1281,共6页 应用数学（英文）

关键词 Multifrontal Method Direct Method Sparse Linear System HPC OpenMP* Intel^(■) MKL Intel^(■) Xeon Phi^(TM) Coprocessor

分类号 TP39 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

1Prakash Ghimire,Huajun Huang.Lie Triple Derivations of the Lie Algebra of Dominant Block Upper Triangular Matrices[J].Algebra Colloquium,2018,25(3):475-492.
2Biswajit Mishra,Mittu Kochery,Peter Wilson,Reuben Wilcock.A Novel Signal Processing Coprocessor for <i>n-</i>Dimensional Geometric Algebra Applications[J].Circuits and Systems,2014,5(11):274-291. 被引量：1
3Alexander Kalinkin,Anton Anders,Roman Anders.Schur Complement Computations in Intel^(■) Math Kernel Library PARDISO[J].Applied Mathematics,2015,6(2):304-311. 被引量：2
4Hongchun Wang,Buqun Luan,Wensheng Niu.Communication Resource Planning Algorithm Based on Time Triggered DIMA Architecture[J].Journal of Beijing Institute of Technology,2019,28(2):327-335. 被引量：1
5Ukbagiorgis Iyasu Gebremeskel,José Manuel Martins Ferreira.An IEEE 1149.x Embedded Test Coprocessor[J].Circuits and Systems,2014,5(7):170-180. 被引量：1
6Alexander Kalinkin,Konstantin Arturov.Asynchronous Approach to Memory Management in Sparse Multifrontal Methods on Multiprocessors[J].Applied Mathematics,2013,4(12):33-39. 被引量：1
7杜飞飞,张德学,王佃涛,郭晓超.BLAKE2b算法优化及OpenCL实现[J].小型微型计算机系统,2019,40(11):2281-2284. 被引量：2
8HAN Shunyuan,ZHANG Zhihong,HE Hong.Research on Hybrid Scheduling Algorithm Based on CAN bus[J].Instrumentation,2017,4(2):22-27. 被引量：3
9Jin Duan,Yungui Li,Xiaoming Chen,Hu Qi,Jianyun Sun.A Parallel FEA Computing Kernel for Building Structures[J].Journal of Applied Mathematics and Physics,2013,1(6):26-30.

Applied Mathematics

2015年第8期

浏览历史

内容加载中请稍等...

Intel^(■) Math Kernel Library PARDISO* forIntel^(■) Xeon Phi^(TM) Manycore Coprocessor

相关作者

相关机构

相关主题

浏览历史