摘要
针对矩阵乘计算的编译优化,解决了由于申威异构众核处理器复杂体系结构及存储层次导致的程序优化难问题,过程中循环分块参数对于程序的优化效果极为重要。基于申威最新一代SW26010-Pro异构众核处理器提出了矩阵乘计算分块参数模型,旨在为矩阵乘计算编译优化的计算分解提供分析模型支撑。模型通过对申威处理器上的存储空间及数据传输过程进行分析,能够确定最优循环分块参数,并对数据传输时间及程序执行时间做出预测。测试证明模型能够在存储空间限制条件下得到最优循环分块参数,且程序执行时间预测平均准确率达到了96.87%。
The compiler optimization for matrix multiplication reduces the difficulty of program optimization caused by the complex architecture and storage hierarchy of Sunway heterogeneous many-core processor.In the process of compiler optimization,the tile size is extremely important for the optimization effect of the program.This paper proposes an analytical matrix multiplication tile size model based on SW26010-Pro heterogeneous many-core processor,aiming to provide analytical model support for the computation decomposition of matrix multiplication compiler optimization.The model can determine the optimal tile size,and predict the data transfer time and program execution time.The model is tested and proven to be able to obtain the optimal tile size under the storage space limitation,and the average accuracy of program execution time prediction reaches 96.87%.
作者
陶小涵
庞建民
朱雨
王博漾
徐金龙
TAO Xiaohan;PANG Jianmin;ZHU Yu;WANG Boyang;XU Jinlong(Information Engineering University,Zhengzhou 450001,China;Zhengzhou University,Zhengzhou 450001,China)
出处
《信息工程大学学报》
2023年第1期65-71,共7页
Journal of Information Engineering University
基金
国家自然科学基金资助项目(61702546)。
关键词
异构众核处理器
矩阵乘计算
分块参数
分析模型
heterogeneous many-core architecture
matrix multiplication
tile size
analytical model