摘要
矩阵乘法是现代信号处理的基本运算,提高数据的并行处理能力对提升矩阵乘法的运算性能具有重要现实意义。文中在基于NoC多核系统中针对不同维度的矩阵乘法的密集型计算进行任务调度以及资源分配,实现了多种适应于不同矩阵乘法的映射方案,其峰值性能可达5078 MFLOPS。同时,文中设计的运算单元相对独立且可重构,对任意维度的矩阵乘法具有良好的扩展性和通用性,解决了通用矩阵乘法器在固定结构中受到I/O带宽和计算资源的限制而产生的运算效率较低和扩展性较差的缺陷。不同维度矩阵乘法的实验结果分析证实了文中设计的运算性能和正确性。
Matrix multiplication is the basic operation of modern signal processing.Improving the parallel processing capacity of data has important practical significance for improving the operation performance of matrix multiplication.In this study,task scheduling and resource allocation are carried out for the intensive computing of matrix multiplication in different dimensions based on NOC multi-core system,and a variety of mapping algorithms suitable for different matrix multiplication are implemented,and the peak performance can reach 5078 MFLOPS.The designed operation unit is relatively independent and reconfigurable,which has good expansibility and generality for matrix multiplication of any dimension.It overcomes the limitation of I/O bandwidth and computing resources in fixed structure,which leads to low efficiency and poor expansibility.Through the analysis of the experimental results of matrix multiplication of different dimensions,the correctness and high performance of the design are verified.
作者
汪杨
王晓蕾
袁子昂
袁儒明
WANG Yang;WANG Xiaolei;YUAN Ziang;YUAN Ruming(School of Electronic Science and Applied Physics,Hefei University of Technology,Hefei 230009,China)
出处
《电子科技》
2021年第5期54-60,共7页
Electronic Science and Technology
基金
国家自然科学基金(61874156)。
关键词
矩阵乘法
并行计算
NoC多核
密集型
任务调度
资源分配
通用性
I/O带宽
matrix multiplication
parallel computing
NoC multi-core
intensive
task scheduling
resource allocation
generality
I/O bandwidth