期刊文献+

TACLeBench中内核程序循环级推测并行性分析

Loop-level speculative parallelism analysis of kernel program in TACLeBench
下载PDF
导出
摘要 线程级推测(TLS)技术可挖掘程序并行执行潜能,提高多核资源利用率,但目前TACLeBench的内核基准仍未在TLS并行化中得到有效分析。针对该问题设计了循环级推测执行的剖析方案和剖析工具。选取7个代表性的TACLeBench内核基准程序,首先对程序进行初始化分析,选取程序热点片段插入循环标识;其次对这些片段进行交叉编译,记录程序推测线程与内存地址相关数据,剖析其循环级最大潜在并行性;最后综合探讨程序运行时的特征(线程粒度、可并行化覆盖率、依赖特征)以及源码对加速比的影响。实验结果表明:1)该类程序适合采用TLS加速,与串行执行结果相比,循环结构的推测执行下的大部分程序的加速比在2以上,其中最高加速比达到20.79;2)利用TLS加速TACLeBench内核程序时,多数应用可有效利用4核到16核的计算资源。 Thread-Level Speculation(TLS)technology can tap the parallel execution potential of programs and improve the utilization of multi-core resources.However,the current TACLeBench kernel benchmarks are not effectively analyzed in TLS parallelization.In response to this problem,the loop-level speculative execution analysis scheme and analysis tool were designed.With 7 representative TACLeBench kernel benchmarks selected,firstly,the initialization analysis was performed to the programs,the program hot fragments were selected to insert the loop identifier.Then,the cross-compilation was performed to these fragments,the program speculative thread and the memory address related data were recorded,and the maximun potential of the loop-level parallelism was analyzed.Finally,the program runtime characteristics(thread granularity,parallelizable coverage,dependency characteristics)and the impacts of the source code on the speedup ratio were comprehensively discussed.Experimental results show that:1)this type of programs is suitable for TLS acceleration,compared with serial execution results,under the loop structure speculative execution,the speedup ratios for most programs are above 2,and the highest speedup ratio in them can reach 20.79;2)by using TLS to accelerate the TACLeBench kernel programs,most applications can effectively make use of 4-core to 16-core computing resources.
作者 孟慧玲 王耀彬 李凌 杨洋 王欣夷 刘志勤 MENG Huiling;WANG Yaobin;LI Ling;YANG Yang;WANG Xinyi;LIU Zhiqin(School of Computer Science and Technology,Southwest University of Science and Technology,Mianyang Sichuan 621010,China;Sichuan Institute of Computer Sciences,Chengdu Sichuan 610041,China)
出处 《计算机应用》 CSCD 北大核心 2021年第9期2652-2657,共6页 journal of Computer Applications
基金 国家自然科学基金面上项目(61672438)。
关键词 线程级推测 多核 并行 TACLeBench 内核基准 Thread-Level Speculation(TLS) multi-core parallel TACLeBench kernel benchmark
  • 相关文献

参考文献6

二级参考文献35

  • 1袁伟,张云泉,孙家昶,李玉成.国产万亿次机群系统NPB性能测试分析[J].计算机研究与发展,2005,42(6):1079-1084. 被引量:13
  • 2Manohar K Prabhu, Kunle Olukotun, et al. Exposing speculative thread parallelism in SPEC2000 [ C ]. Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2005, 142-152.
  • 3Scan Rul, Hans Vandiercndonck, Koen De Bosschere. Function level parallelism driven by data dependencies [ J ]. ACM SIGARCH Computer Architecture News, 2007,35( 1 ) :55-62.
  • 4Jeffrey T Op/inger, David L Heine, Monica S Lain. In search of speculative Thread-Level parallelism [ C]. Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques, 1999, 303-313.
  • 5Du Zhao-hui, Lim Chu-cheow, Li Xiao-feng , et al. A cost-driven compilation framework for speculative parallelization of sequential programs [C]. Proceedings of the ACM SIGPLAN 2004 Confer-ence on Programming Language Design and Implementation, 2004, 71-81.
  • 6Wang Yao-bin, An Hong, Liang Bo. Balancing thread partition for efficiently exploiting speculative thread-level parallelism [ J]. Leeture Notes in Computer Science, 2007,4847:37-46.
  • 7Liu Yuan, An Hong , Liang Bo, et al. An online profile guided optimization approach for speculative parallel tthreading [ J]. Lecture Notes in Computer Science, 2007,4697:28-37.
  • 8Arun Kejariwal, Tian Xin-min, et al. On the performance potential of different types of speculative thread-level parallelism [ C]. Proceedings of the 20th Annual International Conference on Supercom-puting, 2006, 24-33.
  • 9Troy A. Johnson Rudolf Eigenmann T. N. Vijaykumar. Speculative thread decomposition through empirical optimization [ C]. In: Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007, 205-214.
  • 10Sohi G S, Breach S E, Vijaykumar T N. Multiscalar processors [ C]. Proceedings of the 22nd Annual International Symposium on Computer Architecture, 1995, 414-425.

共引文献16

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部