摘要
LOBPCG是一种适合大规模稀疏对称问题的特征值数值解法.本文研究了适合神威太湖之光架构的LOBPCG并行算法.首先提出了基于主、从核的混合并行模型;研究了稀疏矩阵-向量积的并行算法,通过核组间通信隐藏、核组内通信隐藏等技术提高程序速度,并提出一种自动调节从核缓冲数据量的算法,可自动逼近最佳的通信隐藏效果;研究了稠密矩阵积在神威太湖之光架构上的并行算法,针对不同“形态”的输入矩阵提出了不同的矩阵分割算法,速度显著优于其它算法库;在计算最高1.25亿阶矩阵、使用936000计算核心的特征值求解测试中表现出良好的扩展性.我们还测试了该应用在凝聚态物理领域的强关联系统中的性能.
LOBPCG is a numerical method to solve sparse matrix eigenvalue problem.In this paper,methods of optimization details are discussed cover all the main computations of LOBPCG.The parallel model of data and computation for Sunway machine is proposed,what follows is an effective parallel algorithm of sparse matrix-vector product,which implemented with automatically optimized data buffer strategy;then a research of the parallel algorithm of dense matrix multiplication adapted to the Sunway architecture is illustrated,this work get significant promotion.We test this implementation with up to 1 million cores on Sunway Machine.
作者
于天禹
赵永华
赵莲
Yu Tianyu;Zhao Yonghua;Zhao Lian(Computer Network Information Center^Chinese Academy of Sciences,University of Chinese Academy of Sciences,Beijing 100190,China;Computer Network Information Center,Chinese Academy of Sciences Beijing 100190,China)
出处
《数值计算与计算机应用》
2019年第4期291-309,共19页
Journal on Numerical Methods and Computer Applications
基金
国家重点研发计划“高性能计算应用软件协同开发工具与环境研究”(2017YFB0202202),国家重点研发计划高性能计算专项(2016YFB0201302)