基于性能预测的推测多线程循环选择方法被引量：7

A Loop Selection Approach Based on Performance Prediction for Speculative Multithreading

下载PDF

导出

摘要线程级推测(Thread-Level Speculation,TLS)是多核上一种加速串行程序的线程级自动并行化技术。循环具有规则的结构并在运行时占有大量的执行时间,因此循环是挖掘并行性的理想对象。然而,选择哪些循环并行才能提高程序的加速比是一个很难决定的问题。为了解决该问题,该文提出一种基于性能预测的循环选择方法。基于输入训练集获取程序预执行的剖析信息,同时结合各种推测因素,构建了循环结构的性能预测模型。预测结果定量评估了循环推测并行的加速比并决定该循环在运行时是否适合并行。实验结果表明,该文提出的方法能有效地预测循环并行时所蕴含的并行性,并依据预测结果准确地选择具有并行收益的循环推测并行,最终Olden基准测试集加速比性能平均提升了12.34%。 Thread-Level Speculation （TLS） is a thread-level automatic parallelization technique to accelerate sequential programs on multi-core. Loops are usually regular structures and programs spent significant amounts of time executing them, thus loops are ideal candidates for exploiting the parallelism of programs. However, it is difficult to decide which set of loops should be parallelized to improve overall program performance. In order to solve the problem, this paper proposes a loop selection approach based on performance prediction. Basing on the input training set, the paper gathers profiling information during program pre-execution. Combining profiling information associated with the program and various speculative execution factors, the paper establishes a performance prediction model for loops. Then, based on the result of prediction, the paper can quantitatively estimate the speedup of loops and decide which loops should be parallelized on runtime. The experimental results show that the proposed approach effectively predicts the parallelism of loops when speculative execution and accurately selects beneficial loops for speculative parallelization according to the predicted results, finally Olden benchmarks reach 12.34%speedup performance improvement on average speedup.

作者刘斌赵银亮韩博李玉祥吉烁冯博琴武万杰

机构地区西安交通大学计算机科学与技术系

出处《电子与信息学报》 EI CSCD 北大核心 2014年第11期2768-2774,共7页 Journal of Electronics & Information Technology

基金国家自然科学基金(61173040) 国家"863"计划项目(2012AA011003) 博士学科点专项科研基金(20130201110012)资助课题

关键词并行处理线程级推测循环选择性能预测 Parallel processing Thread-Level Speculation （TLS） Loop selection Performance prediction

分类号 TP314 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献18

1Yang L and Zhai A. Dynamically dispatching speculative threads to improve sequential execution[J]. ACM Transactions on Architecture and Code Optimization, 2012, 9(3): 13:1-13:31.
2Vijaykumar T N and Sohi G S. Task selection for a multiscalar processor[C]. Proceedings of the 31st Annual ACM/IEEE International Symposium on Microarchitecture, Dallas, 1998: 81-92.
3Hammond L, Hubbert B A, Siu M, et al.. The stanford hydra cmp[J]. IEEE Micro, 2000, 20(2): 71-84.
4Liu W, Tuck J, Ceze L, et al.. POSH: a TLS compiler that exploits program structure[C]. Proceedings of the 8th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, New York, 2006: 158-167.
5Madriles C, García-Qui?ones C, Sánchez J, et al.. Mitosis: a speculative multithreaded processor based on precomputation slices[J]. IEEE Transactions on Parallel and Distributed Systems, 2008, 19(7): 914-925.
6Jialu H, Jablin T B, Beard S R, et al.. Automatically exploiting cross-invocation parallelism using runtime information[C]. Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, Shenzhen, 2013: 1-11.
7Gao L, Li L, Xue J, et al.. SEED: a statically greedy and dynamically adaptive approach for speculative loop execution[J]. IEEE Transactions on Computers, 2013, 62(5): 1004-1016.
8Sharafeddine M, Jothi K, and Akkary H. Disjoint out-of-order execution processor[J]. ACM Transactions on Architecture and Code Optimization, 2012, 9(3): 19:1-19:32.
9宋少龙,赵银亮,冯博琴,韦远科,王旭昊,赵恒星.支持推测多线程的扩展多核模拟器Prophet+[J].西安交通大学学报,2010,44(10):13-17. 被引量：3
10Wang S Y, Yew P C, and Zhai A. Code transformations for enhancing the performance of speculatively parallel threads[J]. Journal of Circuits, Systems and Computers, 2012, 21(2): 1-23.

二级参考文献9

1BRIAN A,RUDOLF E Application of automatic parallelization to modem challenges of scientific computing industries[C]//Proceedings of the 37th International Conference on Parallel Processing.Piscataway,NJ,USA:IEEE,2008:279-286.
2ARMSTRONG B,EIGENMANN R.Challenges in the automatic parallelization of large-scale computational applications[C]//Proceedings of SPIE/ITCOM 2001.Bellingham,WA,USA:SPIE,2001:50-60.
3TIAN C,FENG M,NAGARAIAN V,et al.Copy or discard execution model for speculative parallelization on muhicores[C].Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture.Piscataway,NJ,USA:IEEE,2008,330-341.
4OHSAWA T,TAKAGI M,KAWAHARA S,et al.Pinot:speculative multi-threading processor architecture exploiting parallelism over a wide range of granularities[C]//Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture.Piscataway,NJ,USA:IEEE,2005,81-92.
5MADRILES C,QUINONES C,SANCHEZ J,et al.Mitosis:a speculative multithreaded processor hased on precomputation slices[J].IEEE Transactions on Parallel and Distributed Systems,2008,19(7):914-925.
6DONG Zhaoyu,ZHAO Yinliang,WEI Yuanke,et al.Prophet:a speculative multi-threading execution model with architectural support based on CMP[C]//Proceedings of the 2009 International Conference on Scalable Computing and Communications.Piscataway,NJ,USA:IEEE,2009,103-108.
7ZIER D,LEE B.Performance evaluation of dynamic speculative multithreading with the cascadia architecture[J].IEEE Transactions on Parallel and Distributed Systems,2010,21(1):47-59.
8SWEETMAN D.See MIPS run[M].San Francisco,CA,USA:Morgan Kaufmann Publishers,2007.
9GAEKE B R.VMIPS Project[EB/OL].[2008-01-24].http://www.dgate.org/VMIPS.

共引文献2

1韦远科,赵银亮,宋少龙,王旭昊,阴培培,李挺.面向片上多核处理器的推测多线程机制下的独立栈模型[J].西安交通大学学报,2010,44(12):10-15. 被引量：1
2马巧梅.基于程序特征的线程划分方法的研究[J].计算机科学与探索,2018,12(6):872-885. 被引量：2

同被引文献56

1Chen T F and Bo:r J L. A performance study of software and hardware data prefetehing schemes[C]. Proceedings of 21st International Symposium on Computer Architecture, Chicago, USA, 1994: 223-232.
2Saavedra R H and Daeyeon P. Improving the effectiveness of software prefetching with adaptive execution[C]. Proceedings of Conference on Parallel Architectures and Compilation Techniques, Boston, USA, 1996: 68-78.
3Hut I and Lin C. Feedback mechanisms for improving probabilistic memory prefetching[C]. Proceedings of 15th International Symposium on High Performance Computer Architecture, North Carolina, USA, 2009: 443-454.
4Dongkeun K, Liao S S W, Wang P H, et al: Physical experimentation with prefetching helper threads on Intel's hyper-thremied processorsIC]. Proceedings of International Symposium on Code Generation and Optimization, California, USA, 2004: 27-38.
5Lu J. Design and implementation of a lightweight runtime optimization system on modern computer architectures[D]. [Ph.D. dissertation], University of Minnesota, 2006.
6Ro W W and Gaudiot J L. Speculative pre-execution assisted by compiler (SPEAR)[J]. Journal of Parallel and Distributed Computing, 2006, 66(8): 1076-1089.
7Somogyi S, Wenisch T F, Ailamaki A, et al: Spatial-temporal memory streaming[C]. Proceedings of thc 36th International Symposium on Computer Architecture, Austin, USA, 2009: 69-80.
8Lee J, Jung C, Lim D, et al: Prefetching with helper threads for loosely coupled multiprocessor systems[J]. IEEE Transactions on Parallel and Distributed Systems, 2009, 20(9): 1309-1324.
9Maxin G, McCurdy C, and Vetter J S. Diagnosis and optimization of application prefetching performance[C]. Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, Oregon, USA,2013:303 312.
10Garside J and Audsley N C. Prefetching across a shared memory tree within a network-on-chip architecture[C]. Proceedings of 15th International Symposium on System-on- Chip, Melbourne, Australia, 2013: 1-4.

引证文献7

1黄艳,张启坤,段赵磊,古志民.基于缓存行为特征的线程数据预取距离控制策略[J].电子与信息学报,2015,37(7):1633-1638. 被引量：1
2冯晓,戴紫彬,李伟,蔡路亭.基于Amdahl定律的多核密码处理器性能模型研究[J].电子与信息学报,2016,38(4):827-833. 被引量：4
3冯晓,戴紫彬,蔡路亭,李伟.基于Amdahl定律扩展的多核处理器性能模型研究[J].电子学报,2017,45(6):1424-1430. 被引量：2
4盛红雷,贾崟.基于人工神经网络的线程数据加速划分[J].舰船电子工程,2019,39(1):85-89.
5李美蓉,赵银亮.一种基于推测代价评估的推测多线程并行粒度调节方法[J].计算机应用与软件,2019,36(4):29-36. 被引量：4
6卜得庆,王耀彬,李凌,杨洋,程一鸣,刘志勤,吴亚东.嵌入式应用中的循环级线程推测并行性分析[J].计算机应用研究,2019,36(9):2691-2695.
7孟慧玲,王耀彬,李凌,杨洋,王欣夷,刘志勤.TACLeBench中内核程序循环级推测并行性分析[J].计算机应用,2021,41(9):2652-2657.

二级引证文献11

1戴紫彬,易肃汶,李伟,南龙梅.椭圆曲线密码处理器的高效并行处理架构研究与设计[J].电子与信息学报,2017,39(10):2487-2494. 被引量：4
2戴卓臣,陆江东.面向数据加密的多核多线程并行研究[J].电子设计工程,2018,26(8):183-187. 被引量：3
3戴乐育,杨天池,郭松,王家琰.可重构分组密码协处理器二维指令架构[J].计算机工程与设计,2018,39(4):918-922.
4严迎建,王寿成,徐进辉,李功丽.基于Amdahl定律的分组密码并行处理模型研究[J].北京理工大学学报,2018,38(9):977-984. 被引量：3
5曲海成,于思淼,刘万军,王鑫源.面向CUDA程序的性能预测框架[J].电子学报,2020,48(4):654-661.
6闵帅博,崔建军,严利平,王冬,束红林,陈恺.基于轻量级并行编程的微位移测量系统设计研究[J].计算机应用与软件,2020,37(9):1-7.
7孟慧玲,王耀彬,李凌,杨洋,王欣夷,刘志勤.TACLeBench中内核程序循环级推测并行性分析[J].计算机应用,2021,41(9):2652-2657.
8丁艳,张海文,孙永彦.基于多网格技术的电网工程造价数据信息分析方法研究[J].电子设计工程,2021,29(19):35-39. 被引量：4
9甄好,王连明.一种基于多摄像头的大场景远程实时监控系统[J].东北师大学报（自然科学版）,2021,53(4):62-67. 被引量：3
10杜梅,黄艳.基于深度神经网络框架的运行时系统调度策略研究[J].无线电工程,2023,53(6):1303-1310. 被引量：1

1陈勇,陈国良,李春生,何家华.SMP机群混合编程模型研究[J].小型微型计算机系统,2004,25(10):1763-1767. 被引量：19
2韩加好,陈颖.一键循环式数控机床工作模式选择按键设计[J].电气应用,2012,31(4):75-78.
3面向CMP的推测多线程编译技术研究[J].中国科技成果,2012(1):20-21.
4在C++Builder中获取程序的命令行参数[J].软件,2002,23(9):79-79.
5数字暗房[J].人像摄影,2005,0(12):137-138.
6杜延宁,赵银亮,韩博,李远成.一种数据结构制导的线程划分方法与执行模型[J].软件学报,2013,24(10):2432-2459. 被引量：2
7李远成,赵银亮,阴培培,韩博.一种应用代价评估的推测多线程路径预测方法[J].西安交通大学学报,2010,44(12):22-27. 被引量：2
8任条娟,陈友荣,王章权.交通路灯监控系统的无线传感网链状路由算法[J].电信科学,2013,29(1):82-88. 被引量：6
9Web 2.0下一步:电脑自动剖析信息[J].黄金时代（上半月）,2009(3):31-31.
10韦远科,赵银亮,宋少龙,王旭昊,阴培培,李挺.面向片上多核处理器的推测多线程机制下的独立栈模型[J].西安交通大学学报,2010,44(12):10-15. 被引量：1

电子与信息学报

2014年第11期

浏览历史

内容加载中请稍等...

基于性能预测的推测多线程循环选择方法被引量：7

参考文献18

二级参考文献9

共引文献2

同被引文献56

引证文献7

二级引证文献11

相关作者

相关机构

相关主题

浏览历史

基于性能预测的推测多线程循环选择方法 被引量：7

参考文献18

二级参考文献9

共引文献2

同被引文献56

引证文献7

二级引证文献11

相关作者

相关机构

相关主题

浏览历史

基于性能预测的推测多线程循环选择方法被引量：7