期刊文献+

基于神威·太湖之光的非结构网格众核优化技术 被引量:6

Multi-Core Optimization Technology of Unstructured Grid Based on Sunway TaihuLight
下载PDF
导出
摘要 为解决高性能计算中的非结构网格离散访存问题,以神威·太湖之光国产超级计算机为平台,根据异构众核处理器SW26010的体系结构特点,提出一种基于排序思想的通用众核优化算法,以减少非结构网格计算中的随机访存。基于网格划分原理,在O(n)时间内对生成的稀疏矩阵非零元素进行并行重排序。采用一种内部映射方式对计算向量实现扩展或变换,将细粒度访存转化为无写冲突的粗粒度访存。对多个实际应用算例的通量计算进行众核优化,结果表明,相比主核上的串行算法,该算法能够获得平均10倍以上的加速效果。 In order to solve discrete memory access problem of unstructured grid in high performance computing,this paper proposes a general multi-core optimization algorithm according to the architecture features of the heterogeneous multi-core processor SW26010.This algorithm takes the Chinese supercomputer,Sunway TaihuLight,as the platform,and is based on a sorting approach.Based on the principle of mesh generation,generated non-zero elements of the sparse matrix are reordered in O(n) time.An internal mapping method is used to extend or transform the computational vectors,and the fine-grained memory access is transformed into the coarse-grained access without writing conflicts.Multi-core optimization is carried out for the flux calculation in several practical examples.Experimental results show that compared with the serial algorithm on the main core,the proposed algorithm can achieve an average acceleration of more than 10 times.
作者 倪鸿 刘鑫 NI Hong;LIU Xin(National Research Centre of Parallel Computer Engineering and Technology,Beijing 100190,China)
出处 《计算机工程》 CAS CSCD 北大核心 2019年第6期45-51,共7页 Computer Engineering
基金 国家重点研发计划“大规模多模式多过程地球系统模式耦合平台开发”(2016YFA0602200)
关键词 离散访存 非结构网格 通量计算 异构众核优化 并行排序 discrete memory access unstructured grid flux calculation heterogeneous multi-core optimization parallel sorting
  • 相关文献

参考文献3

二级参考文献57

  • 1Monien B, Preis R, Diekmann R. Quality matching and local improvement for multilevel graph-partitioning[J]. Parallel Computing, 2000,26 (12) : 1609-1634.
  • 2Karypis G, Kumar V. METIS: unstructured graph partitioning and sparse matrix ordering system[R]. Department of Computer Science, University of Minnesota, 1995.
  • 3ANSYS FLUENT [OL]. http://www, ansys, com/Products/ Simulation+ Technology/Fluid+ Dynamics/ANSYS+ FLUENT.
  • 4ANSYS CFX [OL]. http://www, ansys, corn/Products/Simula- tion+ Technology/Fluid+ Dynamics/ANSYS+ CFX.
  • 5Manferdelli J L, Govindaraju N K, Crall C. Challenges and opportunities in many-core computing. Proceedings of the IEEE, 2008, 96(5): 808-815.
  • 6Shalf J, Dosanjh S, Morrison J. Exascale computing technology challenges. In Proc. the 9th Int. High Performance Computing for Computational Science- VECPAR, June 2011, pp.1-25.
  • 7Daga M, Aji A M, Feng W. On the efficacy of a fused CPU+GPU processor (or APU) for parallel computing. In Proc. Symposium on Application Accelerators in HighPerformance Computing, July 2011, pp.141-149.
  • 8Chung E S, Milder P A, Hoe J C, Mai K. Single-chip heterogeneous computing: Does the future include custom logic, FPGAs, and GPGPUs? In Proc. the 43rd Annual IEEE/ACM International Symposium on Micmarchitecture (MICRO), December 2010, pp.225-236.
  • 9Lee V W, Grochowski E, Geva R. Performance benefits of heterogeneous computing in HPC workloads. In Proc. the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), May 2012, pp.16-26.
  • 10Kumar R, Farkas K I, Jouppi N P et al. Single-ISA heterogeneous multi-core architectures: The potential for processor power reduction. In Proc. the 36th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Dec. 2003, pp.81-92.

共引文献30

同被引文献25

引证文献6

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部