期刊文献+

基于FT2000处理器内存性能测试及优化 被引量:3

FT2000 processor-based Memory Performance Testing and Optimization
下载PDF
导出
摘要 STREAM作为微处理器上广泛应用于测试各种计算机的内存性能的基准测试程序,如何使得其在FT2000/4处理器上发挥高性能是具有挑战性的研究工作。对STREAM工具的各参数进行测试并且合理的设置参数数值可以使得程序性能大幅度提高。数据测试结果表明,优化后,并行程序的最高访存性能能达到13.93GB/s,对比优化前的最高访存性能提高了56.34%。 STREAM is a benchmark program widely used to test memory performance of various computers on microprocessors,and it is challenging to make it perform well on FT2000/4 processors.Testing of the parameters of the STREAM tool and reasonable setting of the parameter values can lead to significant improvements in program performance.The data test results show that after optimization,the maximum access performance of the parallel program can reach 13.93GB/s,which is 56.34%higher than the maximum access performance before optimization.
作者 李竞择 苟喜东 范承宇 LI Jing-Ze;GOU Xi-Dong;FAN Cheng-Yu(China Ordnance Equipment Group Automation Institute Co.,Ltd.,Mianyang Sichuan 621000,China)
出处 《机电产品开发与创新》 2022年第3期110-113,共4页 Development & Innovation of Machinery & Electrical Products
关键词 STREAM 内存性能 FT2000 昆仑固件 麒麟操作系统 STREAM Memory Performance FT2000 Kunlun Firmwarer Kylin OS
  • 相关文献

参考文献2

二级参考文献12

  • 1http://www.nscc-tj.gov.cn/resources/resource_1.asp.
  • 2McCalpin J D.Stream:Sustainable memory bandwidth in high performance computers[EB/OL].[2013-05-16].http://www.cs.virginia.edu/stream/.
  • 3Gong Chun-ye,Liu Jie,Chi Li-hua,et al.GPU accelerated simulations of 3Ddeterministic particle transport using discrete ordinates method[J].Journal of Computational Physics,2011,230(15):6010-6022.
  • 4Petrini F,Fossum G,Fernandez J,et al.Multicore surprise lessons learned from optimizing sweep3Don the cell broadband engine[C]∥Proc of International Parallel and Distributed Processing Symposim,2007:1-10.
  • 5Gan Xin-biao,Wang Zhi-ying,Shen Li,et al.ab-Stream:A framework for programming many-core[J].Electrical Review,2012,88(7b):341-344.
  • 6Molka D,Hackenberg D,Schone R,et al.Memory performance and cache coherency effects on an Intel Nehalem multiprocessor system[C]∥Proc of the 18th International Conference on Parallel Architectures and Compilation Techniques,2009:261-270.
  • 7Preeti R P,Hiroshi N.Augmenting loop tiling with data alignment for improved cache performance[J].IEEE Transactions on Computers,1999,48(2):142-149.
  • 8Fraboulet A,Kodary K,Mignotte A.Loop fusion for memory space optimization[C]∥Proc of IEEE International Symposium on System Synthesis,2001:95-100.
  • 9Alvin R,Chatterjee L S,Praveen K,et al.Recursive array layouts and fast matrix multiplication[J].IEEE Transactions on Parallel and Distributed Systems,2002,13(11):1105-1123.
  • 10Pike G,Hilnger P N.Better tiling and array contraction for compiling scientic programs[C]∥Proc of the IEEE/ACM Conference on Supercomputing,2002:1-12.

共引文献3

同被引文献18

引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部