摘要
为解决高性能计算中的非结构网格离散访存问题,以神威·太湖之光国产超级计算机为平台,根据异构众核处理器SW26010的体系结构特点,提出一种基于排序思想的通用众核优化算法,以减少非结构网格计算中的随机访存。基于网格划分原理,在O(n)时间内对生成的稀疏矩阵非零元素进行并行重排序。采用一种内部映射方式对计算向量实现扩展或变换,将细粒度访存转化为无写冲突的粗粒度访存。对多个实际应用算例的通量计算进行众核优化,结果表明,相比主核上的串行算法,该算法能够获得平均10倍以上的加速效果。
In order to solve discrete memory access problem of unstructured grid in high performance computing,this paper proposes a general multi-core optimization algorithm according to the architecture features of the heterogeneous multi-core processor SW26010.This algorithm takes the Chinese supercomputer,Sunway TaihuLight,as the platform,and is based on a sorting approach.Based on the principle of mesh generation,generated non-zero elements of the sparse matrix are reordered in O(n) time.An internal mapping method is used to extend or transform the computational vectors,and the fine-grained memory access is transformed into the coarse-grained access without writing conflicts.Multi-core optimization is carried out for the flux calculation in several practical examples.Experimental results show that compared with the serial algorithm on the main core,the proposed algorithm can achieve an average acceleration of more than 10 times.
作者
倪鸿
刘鑫
NI Hong;LIU Xin(National Research Centre of Parallel Computer Engineering and Technology,Beijing 100190,China)
出处
《计算机工程》
CAS
CSCD
北大核心
2019年第6期45-51,共7页
Computer Engineering
基金
国家重点研发计划“大规模多模式多过程地球系统模式耦合平台开发”(2016YFA0602200)
关键词
离散访存
非结构网格
通量计算
异构众核优化
并行排序
discrete memory access
unstructured grid
flux calculation
heterogeneous multi-core optimization
parallel sorting