[目的]科学智能(AI for Science)方法正在深刻地改变当前科学计算的格局。其融合了物理模型、人工智能与高性能计算,针对传统科学计算中的高维问题,通过数据拟合的方式实现成量级的增加高精度科学计算问题的时间和空间尺度,正在推动一...[目的]科学智能(AI for Science)方法正在深刻地改变当前科学计算的格局。其融合了物理模型、人工智能与高性能计算,针对传统科学计算中的高维问题,通过数据拟合的方式实现成量级的增加高精度科学计算问题的时间和空间尺度,正在推动一场科研范式的变革。[方法]本文针对第一性原理精度的分子动力学,提出一种HPC+AI驱动的科学智能计算平台,针对科学智能在工作流上带来的变化与挑战,从科学数据的生成与数据集制备、构型空间探索与训练样本标注、科学智能模型的高效训练及大规模高效推理等四个方面阐述构建科学智能计算平台的关键技术与流程。[结果]本文所提出的计算平台在整合科学智能计算工作流的基础上,针对HPC+AI驱动的第一性原理精度分子动力学这一典型应用,提出了基于卡尔曼滤波的主动学习策略;改进了拟二阶AI模型训练方法,实现训练时间从天到分钟级的加速;利用五阶多项式AI模型压缩技术实现在同等硬件条件下模型推理的体系规模提高1个数量级,到解时间提高3-9倍。[结论]通过上述工作的整合,形成一套可用于第一性原理精度分子动力学计算的科学智能计算平台。[局限与展望]科学智能计算方法与工作流仍处于蓬勃发展阶段,在高精度数据、更通用AI模型和高效的计算方法等方面仍面临巨大的挑战,也将成为本文工作在未来的重要探索方向。展开更多
The growing demand for semiconductor devices simulation poses a big challenge for large-scale electronic structure calculations.Among various methods,the linearly scaling three-dimensional fragment(LS3DF)method exhibi...The growing demand for semiconductor devices simulation poses a big challenge for large-scale electronic structure calculations.Among various methods,the linearly scaling three-dimensional fragment(LS3DF)method exhibits excellent scalability in large-scale simulations.Based on algorithmic and system-level optimizations,we propose a highly scalable and highly efficient implementation of LS3DF on a domestic heterogeneous supercomputer equipped with acceler-ators.In terms of algorithmic optimizations,the original all-band conjugate gradient algorithm is refined to achieve faster convergence,and mixed precision computing is adopted to increase overall efficiency.In terms of system-level optimiza-tions,the original two-layer parallel structure is replaced by a coarse-grained parallel method.Optimization strategies such as multi-stream,kernel fusion,and redundant computation removal are proposed to increase further utilization of the com-putational power provided by the heterogeneous machines.As a result,our optimized LS3DF can scale to a 10-million sili-con atoms system,attaining a peak performance of 34.8 PFLOPS(21.2% of the peak).All the improvements can be adapt-ed to the next-generation supercomputers for larger simulations.展开更多
文摘[目的]科学智能(AI for Science)方法正在深刻地改变当前科学计算的格局。其融合了物理模型、人工智能与高性能计算,针对传统科学计算中的高维问题,通过数据拟合的方式实现成量级的增加高精度科学计算问题的时间和空间尺度,正在推动一场科研范式的变革。[方法]本文针对第一性原理精度的分子动力学,提出一种HPC+AI驱动的科学智能计算平台,针对科学智能在工作流上带来的变化与挑战,从科学数据的生成与数据集制备、构型空间探索与训练样本标注、科学智能模型的高效训练及大规模高效推理等四个方面阐述构建科学智能计算平台的关键技术与流程。[结果]本文所提出的计算平台在整合科学智能计算工作流的基础上,针对HPC+AI驱动的第一性原理精度分子动力学这一典型应用,提出了基于卡尔曼滤波的主动学习策略;改进了拟二阶AI模型训练方法,实现训练时间从天到分钟级的加速;利用五阶多项式AI模型压缩技术实现在同等硬件条件下模型推理的体系规模提高1个数量级,到解时间提高3-9倍。[结论]通过上述工作的整合,形成一套可用于第一性原理精度分子动力学计算的科学智能计算平台。[局限与展望]科学智能计算方法与工作流仍处于蓬勃发展阶段,在高精度数据、更通用AI模型和高效的计算方法等方面仍面临巨大的挑战,也将成为本文工作在未来的重要探索方向。
基金This work was supported by the National Key Research and Development Program of China under Grant No.2021YFB0300600the National Natural Science Foundation of China under Grant Nos.92270206,T2125013,62032023,61972377,T2293702,and 12274360+2 种基金the Chinese Academy of Sciences Project for Young Scientists in Basic Research under Grant No.YSBR-005the Network Information Project of Chinese Academy of Sciences under Grant No.CASWX2021SF-0103the Key Research Program of Chinese Academy of Sciences under Grant No.ZDBSSSW-WHC002.
文摘The growing demand for semiconductor devices simulation poses a big challenge for large-scale electronic structure calculations.Among various methods,the linearly scaling three-dimensional fragment(LS3DF)method exhibits excellent scalability in large-scale simulations.Based on algorithmic and system-level optimizations,we propose a highly scalable and highly efficient implementation of LS3DF on a domestic heterogeneous supercomputer equipped with acceler-ators.In terms of algorithmic optimizations,the original all-band conjugate gradient algorithm is refined to achieve faster convergence,and mixed precision computing is adopted to increase overall efficiency.In terms of system-level optimiza-tions,the original two-layer parallel structure is replaced by a coarse-grained parallel method.Optimization strategies such as multi-stream,kernel fusion,and redundant computation removal are proposed to increase further utilization of the com-putational power provided by the heterogeneous machines.As a result,our optimized LS3DF can scale to a 10-million sili-con atoms system,attaining a peak performance of 34.8 PFLOPS(21.2% of the peak).All the improvements can be adapt-ed to the next-generation supercomputers for larger simulations.