摘要
基于量子力学的密度泛函微扰理论(DFPT)可以用来计算分子和材料的多种物理化学性质,目前被广泛应用于新材料等领域的研究中;同时,异构众核处理器架构逐渐成为超算的主流。因此,针对异构众核处理器重新设计和优化DFPT程序以提升其计算效率,对物理化学性质的计算及其科学应用具有重要意义。文中对DFPT中一阶响应密度和一阶响应哈密顿矩阵的计算针对众核处理器体系结构进行了优化,并在新一代神威处理器上进行了验证。优化技术包括循环分块、离散访存处理和协同规约。其中,循环分块对任务进行划分从而由众核并行地执行;离散访存处理将离散访存转换为更高效的连续访存;协同规约解决了写冲突问题。实验结果表明,在一个核组上,优化后的程序性能较优化前提高了8.2~74.4倍,并且具有良好的强可扩展性和弱可扩展性。
Density-functional perturbation theory(DFPT)based on quantum mechanics can be used to calculate a variety of physicochemical properties of molecules and materials and is now widely used in the research of new materials.Meanwhile,heteroge-neous many-core processor architectures are becoming the mainstream of supercomputing.Therefore,redesigning and optimizing DFPT programs for heterogeneous many-core processors to improve their computational efficiency is of great importance for the computation of physicochemical properties and their scientific applications.In this work,the computation of first-order response density and first-order response Hamiltonian matrix in DFPT is optimized for many-core processor architecture and verified on the new generation Sunway processors.Optimization techniques include loop tiling,discrete memory access processing and colla-borative reduction.Among them,loop tiling divides tasks so that they can be executed by many cores in parallel;discrete memory access processing converts discrete accesses into more efficient continuous memory accesses;collaborative reduction solves the write conflict problem.Experimental results show that the performance of the optimized program improves by 8.2 to 74.4 times over the pre-optimization program on one core group,and has good strong scalability and weak scalability.
作者
罗海文
吴扬俊
商红慧
LUO Haiwen;WU Yangjun;SHANG Honghui(State Key Laboratory of Processors,Institute of Computing Technology,Chinese Acadamy of Science,Beijing 100190,China)
出处
《计算机科学》
CSCD
北大核心
2023年第6期1-9,共9页
Computer Science
基金
国家重点研发计划(2020YFB1709500)
国家自然科学基金(22003073)。
关键词
密度函数微扰理论
第一性原理计算
高性能计算
新一代神威异构众核处理器
Density-functional perturbation theory
First-principle calculation
High-performance computing
New generation Sunway heterogeneous many-core processor