期刊文献+

面向申威异构众核处理器的矩阵乘分块参数模型

Analytical Tile Size Model for Matrix Multiplication for Sunway Heterogeneous Many-core Architecture
下载PDF
导出
摘要 针对矩阵乘计算的编译优化,解决了由于申威异构众核处理器复杂体系结构及存储层次导致的程序优化难问题,过程中循环分块参数对于程序的优化效果极为重要。基于申威最新一代SW26010-Pro异构众核处理器提出了矩阵乘计算分块参数模型,旨在为矩阵乘计算编译优化的计算分解提供分析模型支撑。模型通过对申威处理器上的存储空间及数据传输过程进行分析,能够确定最优循环分块参数,并对数据传输时间及程序执行时间做出预测。测试证明模型能够在存储空间限制条件下得到最优循环分块参数,且程序执行时间预测平均准确率达到了96.87%。 The compiler optimization for matrix multiplication reduces the difficulty of program optimization caused by the complex architecture and storage hierarchy of Sunway heterogeneous many-core processor.In the process of compiler optimization,the tile size is extremely important for the optimization effect of the program.This paper proposes an analytical matrix multiplication tile size model based on SW26010-Pro heterogeneous many-core processor,aiming to provide analytical model support for the computation decomposition of matrix multiplication compiler optimization.The model can determine the optimal tile size,and predict the data transfer time and program execution time.The model is tested and proven to be able to obtain the optimal tile size under the storage space limitation,and the average accuracy of program execution time prediction reaches 96.87%.
作者 陶小涵 庞建民 朱雨 王博漾 徐金龙 TAO Xiaohan;PANG Jianmin;ZHU Yu;WANG Boyang;XU Jinlong(Information Engineering University,Zhengzhou 450001,China;Zhengzhou University,Zhengzhou 450001,China)
出处 《信息工程大学学报》 2023年第1期65-71,共7页 Journal of Information Engineering University
基金 国家自然科学基金资助项目(61702546)。
关键词 异构众核处理器 矩阵乘计算 分块参数 分析模型 heterogeneous many-core architecture matrix multiplication tile size analytical model
  • 相关文献

参考文献4

二级参考文献85

  • 1张珩,沈海华.龙芯2号微处理器的功能验证[J].计算机研究与发展,2006,43(6):974-979. 被引量:26
  • 2Manferdelli J L, Govindaraju N K, Crall C. Challenges and opportunities in many-core computing. Proceedings of the IEEE, 2008, 96(5): 808-815.
  • 3Shalf J, Dosanjh S, Morrison J. Exascale computing technology challenges. In Proc. the 9th Int. High Performance Computing for Computational Science- VECPAR, June 2011, pp.1-25.
  • 4Daga M, Aji A M, Feng W. On the efficacy of a fused CPU+GPU processor (or APU) for parallel computing. In Proc. Symposium on Application Accelerators in HighPerformance Computing, July 2011, pp.141-149.
  • 5Chung E S, Milder P A, Hoe J C, Mai K. Single-chip heterogeneous computing: Does the future include custom logic, FPGAs, and GPGPUs? In Proc. the 43rd Annual IEEE/ACM International Symposium on Micmarchitecture (MICRO), December 2010, pp.225-236.
  • 6Lee V W, Grochowski E, Geva R. Performance benefits of heterogeneous computing in HPC workloads. In Proc. the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), May 2012, pp.16-26.
  • 7Kumar R, Farkas K I, Jouppi N P et al. Single-ISA heterogeneous multi-core architectures: The potential for processor power reduction. In Proc. the 36th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Dec. 2003, pp.81-92.
  • 8Lee V W, Kim C, Chhugani J et al. Debunking the 100X GPU vs. CPU myth: An evaluation of throughput computing on CPU and GPU. In Proc. the 37th Annual International Symposium on Computer Architecture (ISCA), June 2010, pp. 451-460.
  • 9Wittenbrink C M, Kilgariff E, Prabhu A. Fermi GF100 GPU architecture. IEEE Micro, 2011, 31(2): 50-59.
  • 10Kapasi U J, Dally W J, Rixner S et al. The imagine stream processor. In Proc. IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD ), September 2002, pp. 282-288.

共引文献35

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部