期刊文献+

基于向量扩展多核处理器的矩阵乘法算法优化研究 被引量:4

Optimization of matrix multiplication based on a multi-core architecture extended with vector units
下载PDF
导出
摘要 在GODSON-3B八核处理器平台上,对矩阵乘法算法进行了优化和评估,针对矩阵乘法中A,B,C三个矩阵各自的访存特点,采用不同的方法对其访存行为进行优化,隐藏访存时间,使矩阵乘法性能达到122Gflops,效率为95.3%. Based on the GODSON-3B &core processor, an optimized implementation and evaluation of matrix multiplication was proposed. For the memory access characteristic of each matrix in matrix multiplication, different methods were used to optimize the memory access behavior, hiding memory access time. The performance of optimized matrix multiplication achieves 122 Gflops, and an efficiency of 95.3 %.
出处 《中国科学技术大学学报》 CAS CSCD 北大核心 2011年第2期173-182,共10页 JUSTC
基金 国家自然科学基金(60736012 60921002) 国家重点基础研究发展(973)计划(2005CB321600) 中国高技术研究发展(863)计划(2008AA110901)资助
关键词 多核 向量扩展 寄存器堆 矩阵乘法 multi-core vector expansion register file matrix multiplication
  • 相关文献

参考文献13

  • 1Vangal S R, Howard J, Ruhl G, et al. An 80-tile sub- 100-W teraFLOPS processor in 65-nm CMOS [J]. IEEE Journal of Solid-State Circuits, 2008, 43(1) : 29- 41.
  • 2Kahle J A, Day M N, Hofstee H P, et al. Introduction to the cell multiprocessor[J]. IBM Journal of Research and Development, 2005, 49 (4/5) 589-604:.
  • 3Kapasi U, Dally W J, Rixner S, et al. The imagine stream processor [C]// Proceedings of the 2002 International Confernce on Computer Design. Freiburg, Germany: IEEE Press, 2002: 282-288.
  • 4Waingold E, Taylor M, Sarkar V, et al. Baring it all to software., raw maehines[J]. IEEE Computer, 1997, 30(9) : 86-93.
  • 5Sankaralingam K, Nagarajan R, McDonald R, et al. Distributed microarchitectural protocols in the TRIPS prototype processor [C]// Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture. Washington, USA: IEEE Computer Society, 2006: 480-491.
  • 6Gunnels J A, Henry G M, van de Geijn R A. A family of high performance matrix multiplication algorithms [C]// Proceedings of the International Conference on Computational Science - Part I. London, UK: Springer, 2001: 51-60.
  • 7Goto K. van de Geijn R A. On reducing TLB misses in matrix multiplication[R]. CS-TR-02-55, Department of Computer Scienees, The University of Texas at Austin, 2002.
  • 8Goto K. van de Geijn R A. Anatomy of high- performance matrix multiplication [ J ]. ACM Transactions on Mathematical Software, 2008, 34(3): Article 12(1-25).
  • 9Gunnels J, Lin C, Morrow G, et al. A flexible class of parallel matrix multiplication algorithms [C]// First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing. Washington, USA: IEEE Computer Society, 1998, 12: 110-116.
  • 10Marker B, van Zee F G, Goto K, et al. Toward sealable matrix multiply on multithreaded architectures [C]// Proceedings of the 13th International European Conference on Parallel and Distributed Computing. Rennes, France: ACM Press, 2007: 748-757.

同被引文献11

引证文献4

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部