密度矩阵重正化群的异构并行优化被引量：1

Hybrid parallel optimization of density matrix renormalization group method

下载PDF

导出

摘要密度矩阵重正化群方法(DMRG)在求解一维强关联格点模型的基态时可以获得较高的精度,在应用于二维或准二维问题时,要达到类似的精度通常需要较大的计算量与存储空间.本文提出一种新的DMRG异构并行策略,可以同时发挥计算机中央处理器(CPU)和图形处理器(GPU)的计算性能.针对最耗时的哈密顿量对角化部分,实现了数据的分布式存储,并且给出了CPU和GPU之间的负载平衡策略.以费米Hubbard模型为例,测试了异构并行程序在不同DMRG保留状态数下的运行表现,并给出了相应的性能基准.应用于4腿梯子时,观测到了高温超导中常见的电荷密度条纹,此时保留状态数达到104,使用的GPU显存小于12 GB. Density matrix renormalization group(DMRG), as a numerical method of solving the ground state of onedimensional strongly-correlated lattice model with very high accuracy, requires expensive computational and memory cost when applied to two-and quasi-two-dimensional problems. The number of DMRG kept states is generally very large to achieve a reliable accuracy for these applications, which results in numerous matrix and vector operations and unbearably consuming time in the absence of the proper parallelization. However, due to its sequential nature, the parallelization of DMRG algorithm is usually not straightforward. In this work, we propose a new hybrid parallelization strategy for the DMRG method. It takes advantage of the computing capability of both central processing unit(CPU) and graphics processing unit(GPU) of the computer. In order to achieve as many as DMRG kept states within a limited GPU memory, we adopt the four-block formulation of the Hamiltonian rather than the two-block formulation. The later consumes much more memories, which has been used in another pioneer work on the hybrid parallelization of the DMRG algorithm, and only a small number of DMRG kept states are available. Our parallel strategy focuses on the diagonalization of the Hamiltonian, which is the most time-consuming part of the whole DMRG procedure. A hybrid parallelization strategy of diagonalization method is implemented, in which the required data for diagonalization are distributed on both the host and GPU memory, and the data exchange between them is negligible in our data partitioning scheme. The matrix operations are also shared on both CPU and GPU when the Hamiltonian acts on a wave function, while the distribution of these operations is determined by a load balancing strategy.Taking fermionic Hubbard model for example, we examine the running performance of the hybrid parallelization strategy with different DMRG kept states and provide corresponding performance benchmark. On a 4-leg ladder, we employ the conserved quantities with U(1) symmetry of the model and a good-quantum number based task scheduling to further reduce the GPU memory cost. We manage to obtain a moderate speedup of the hybrid parallelization for a wide range of DMRG kept states. In our example, the ground state energy with high accuracy is obtained by the extrapolation of the results, with different numbers of states kept, and we show charge stripes which are usually experimentally observed in high-temperature superconductors. In this case, we keep 104 DMRG states and the GPU memory cost is less than 12 Gigabytes.

作者陈富州程晨罗洪刚 Chen Fu-Zhou;Cheng Chen;Luo Hong-Gang(School of Physical Science and Technology,Lanzhou University,Lanzhou 730000,China;Beijing Computational Science Research Center,Beijing 100084,China)

机构地区兰州大学物理科学与技术学院北京计算科学研究中心

出处《物理学报》 SCIE EI CAS CSCD 北大核心 2019年第12期46-53,共8页 Acta Physica Sinica

基金国家自然科学基金(批准号:11674139,11834005) 长江学者和创新团队发展计划(批准号:IRT-16R35)资助的课题~~

关键词密度矩阵重正化群强关联格点模型异构并行 density matrix renormalization group strongly correlated lattice model hybrid parallelization

分类号 O413 [理学—理论物理] TP338.6 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

同被引文献16

1冉娟,任琼.关于大数据存储过程中缺失信息检测仿真[J].计算机仿真,2018,35(12):451-455. 被引量：3
2袁开坚,张兴明,高彦钊.基于并行度最大化的多目标优化任务划分算法[J].计算机应用,2017,37(7):1916-1920. 被引量：4
3亢良伊,王建飞,刘杰,叶丹.可扩展机器学习的并行与分布式优化算法综述[J].软件学报,2018,29(1):109-130. 被引量：29
4刘炳含,付忠广,王永智,王鹏凯,高学伟.基于并行计算的大数据挖掘技术及其在电站锅炉性能优化中的应用[J].动力工程学报,2018,38(6):431-439. 被引量：26
5田芳,周孝信,于之虹.基于灵敏度分析和时域仿真的暂态稳定预防控制优化方法[J].电力自动化设备,2018,38(7):155-161. 被引量：15
6汪笃军,刘天羽.微电网线路与负荷有序并行恢复的优化方案[J].电测与仪表,2018,55(13):67-73. 被引量：4
7肖文,胡娟,周晓峰.基于MapReduce计算模型的并行关联规则挖掘算法研究综述[J].计算机应用研究,2018,35(1):13-23. 被引量：47
8金宇.基于云计算环境的大数据兼容性存储系统设计[J].现代电子技术,2019,42(1):24-27. 被引量：15
9郭敏,赵巧娥,高金城,周斌龙.大数据下风电场混合算法建模研究[J].哈尔滨理工大学学报,2019,24(1):48-54. 被引量：8
10罗天,汪可友,李国杰,罗金山,周烨.基于拉格朗日对偶松弛的多区域柔性直流互联电网无功优化[J].电力系统自动化,2019,43(11):68-76. 被引量：24

引证文献1

1郭大亮.发电设备监测大数据存储优化与并行研究[J].自动化与仪器仪表,2020(10):184-186. 被引量：2

二级引证文献2

1王奔,涂珂,李庭瑞.电力大数据面临的机遇与挑战探索[J].中国宽带,2020(10):107-108.
2张富建.光伏发电系统远程监测平台设计及其稳定性应用的实践探索——以“源网荷”一体智能光伏实验室电网项目为例[J].科技管理研究,2023,43(18):162-168. 被引量：8

1ST开始提供汽车微控制器嵌入式PCM样片[J].单片机与嵌入式系统应用,2019,19(2):94-94.
2明天的内存今天的选择[J].电脑爱好者,2019,0(7):15-20.
3NVIDIA宣布RTX 2070显卡10月17日上市[J].微型计算机,2018,0(30):52-52.
4李文强,帅志刚.面向激发态结构与过程的计算化学软件发展[J].中国科学基金,2018,32(1):76-84.
5徐沈智,艾小猛,邹佳芯,张舒捷,李湃,黄越辉,文劲宇.优选状态数的MCMC算法在风电功率序列生成中的应用[J].电力自动化设备,2019,39(5):61-68. 被引量：7
6李明杰,冯有前,尹忠海,周诚.基于微型FCN和传感器数据融合的迷宫小车姿态调整[J].传感器与微系统,2019,38(4):93-95. 被引量：1
7刘雨琛.基于梯度下降法的学生体育成绩预测模型研究[J].中国科技纵横,2019,0(1):222-223. 被引量：4
8朱凌飞,万旺根.基于骨架模型的人体行为分析[J].电子测量技术,2019,42(8):68-73.
9张强.RTX 2060光追初体验神舟战神GX8游戏本[J].计算机与网络,2019,45(7):24-25.
10NVIDIA发布GeForce GTX 1660[J].微型计算机,2019,0(11):88-88.

物理学报

2019年第12期

浏览历史

内容加载中请稍等...

密度矩阵重正化群的异构并行优化被引量：1

同被引文献16

引证文献1

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

密度矩阵重正化群的异构并行优化 被引量：1

同被引文献16

引证文献1

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

密度矩阵重正化群的异构并行优化被引量：1