期刊文献+

龙芯3A上复数矩阵乘法的多线程优化 被引量:1

Multi-threaded Optimization of Complex Matrix Multiplication on Loongson-3A Architecture
原文传递
导出
摘要 BLAS库分为两类函数运算:复数函数与实数函数。矩阵乘法函数是BLAS库的核心函数,BLAS库中的许多函数在实现时都调用了矩阵乘法函数。文章结合龙芯3A体系结构的特点,通过对矩阵乘法计算过程的分析选择了先对矩阵分块然后进行任务划分的方式,从而减少了数据拷贝数量,提高了拷贝数据的利用率,并运用循环展开、指令调度、数据分块等技术对子线程的运算进行了优化。优化后的ZGEMM函数的多线程运算速度是ATLAS库的两倍。 There are two kinds of function operation in BLAS library: real functions and complex functions.Matrix multiplication is the most important function in BLAS library,many other functions in BLAS library call this function to complete the calculation.Combined with the characteristics of Loongson-3A architecture,and through analyzing the computation of the matrix multiplication this paper finds out the best way to divide tasks,thereby the movements of data between cache and main memory are reduced.The performance of the child thread is improved by means of loop unrolling,instruction scheduling and matrix partition.The computation speed of our ZGEMM is two times faster than that of the ATLAS library.
出处 《电子技术(上海)》 2011年第12期1-3,共3页 Electronic Technology
基金 国家863计划 多核龙芯处理器系统软件移植与开发(2008AA01902) 核高基重大专项 基于龙芯3号的通信与数学库的研制(2009ZX01028-002-003-005)
关键词 基础线性代数程序集 ZGEMM 任务划分 多线程 BLAS ZGEMM task partition multi-threading
  • 相关文献

参考文献8

  • 1中国科学院计算技术研究所.龙芯3A处理器用户手册0.1版[R].2009,1.
  • 2Lawson C L,Hanson R J,Kincaid D R,et al.Basic linear algebra subprograms for Fortran usage[J].ACM Transactions on Mathematical Software,1979,5(3):308-323.
  • 3Whaley R.C,Petitet A,Jack J D.Automated empirical optimization of software and the ATLAS project[J].Parallel Computing,2001,27(1/2):3-35.
  • 4Gunnels P A,Henry G M,PRobert A.van de Geijn.A Family of high-performance matrix multiplication algorithms[C].Proceedings of the International Conference on Computational Sciences-Part Ⅰ,May.2001.
  • 5顾乃杰,李凯,陈国良,吴超.基于龙芯2F体系结构的BLAS库优化[J].中国科学技术大学学报,2008,38(7):854-859. 被引量:13
  • 6Kazushige G.Anatomy of high-performance matrix multiplication[J].ACM Trans.on Mathematical Software,2007,34(3):1-24.
  • 7李凯.基于龙芯体系结构的标准函数库优化[D].合肥:中国科学技术大学,2009,5.
  • 8苏波,李凯,徐志广,何颂颂.龙芯2F上的访存优化[J].计算机系统应用,2010,19(1):171-175. 被引量:7

二级参考文献20

  • 1KasperskyK.代码优化:有效使用内存.北京:电子工业出版社.2004.85.
  • 2http://www.loongson.cn/loongson/.
  • 3龙芯2F处理器用户手册.中国科学院计算技术研究所.
  • 4http://www.kd50.ustc.edu.cn.
  • 5http://www.netlib.org.
  • 6Lawson CL, Hanson RJ, et al. Basic linear algebra subprograms for Fortran usage. ACM Trans. Math. Software, 1979,5(3):324 - 325.
  • 7Clint Whaly R, Petitet A, Dongarra JJ. Automated empirical optimation of software and the ATL AS project. Parallel Computing, 2001,270-2):3 -35.
  • 8Lawson C L, Hanson R J, Kincaid D R, et al. Basic linear algebra subprograms for Fortran usage[J]. ACM Transactions on Mathematical Software, 1979, 5 (3) : 308-323.
  • 9Dongarra J J, Croz J D, Hammarling S, et al. An extended set of Fortran basic linear algebra subprograms[J]. ACM Transactions on Mathematical Software, 1988, 14(1): 1-17.
  • 10Dongarra J J, Croz J D, Hammarling S, et al. A set of level 3 basic linear algebra subprograms [J]. ACM Transactions on Mathematical Software, 1990, 16(1): 1-17.

共引文献16

同被引文献10

引证文献1

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部