期刊文献+

基于性能预测的推测多线程循环选择方法 被引量:7

A Loop Selection Approach Based on Performance Prediction for Speculative Multithreading
下载PDF
导出
摘要 线程级推测(Thread-Level Speculation,TLS)是多核上一种加速串行程序的线程级自动并行化技术。循环具有规则的结构并在运行时占有大量的执行时间,因此循环是挖掘并行性的理想对象。然而,选择哪些循环并行才能提高程序的加速比是一个很难决定的问题。为了解决该问题,该文提出一种基于性能预测的循环选择方法。基于输入训练集获取程序预执行的剖析信息,同时结合各种推测因素,构建了循环结构的性能预测模型。预测结果定量评估了循环推测并行的加速比并决定该循环在运行时是否适合并行。实验结果表明,该文提出的方法能有效地预测循环并行时所蕴含的并行性,并依据预测结果准确地选择具有并行收益的循环推测并行,最终Olden基准测试集加速比性能平均提升了12.34%。 Thread-Level Speculation (TLS) is a thread-level automatic parallelization technique to accelerate sequential programs on multi-core. Loops are usually regular structures and programs spent significant amounts of time executing them, thus loops are ideal candidates for exploiting the parallelism of programs. However, it is difficult to decide which set of loops should be parallelized to improve overall program performance. In order to solve the problem, this paper proposes a loop selection approach based on performance prediction. Basing on the input training set, the paper gathers profiling information during program pre-execution. Combining profiling information associated with the program and various speculative execution factors, the paper establishes a performance prediction model for loops. Then, based on the result of prediction, the paper can quantitatively estimate the speedup of loops and decide which loops should be parallelized on runtime. The experimental results show that the proposed approach effectively predicts the parallelism of loops when speculative execution and accurately selects beneficial loops for speculative parallelization according to the predicted results, finally Olden benchmarks reach 12.34%speedup performance improvement on average speedup.
出处 《电子与信息学报》 EI CSCD 北大核心 2014年第11期2768-2774,共7页 Journal of Electronics & Information Technology
基金 国家自然科学基金(61173040) 国家"863"计划项目(2012AA011003) 博士学科点专项科研基金(20130201110012)资助课题
关键词 并行处理 线程级推测 循环选择 性能预测 Parallel processing Thread-Level Speculation (TLS) Loop selection Performance prediction
  • 相关文献

参考文献18

  • 1Yang L and Zhai A. Dynamically dispatching speculative threads to improve sequential execution[J]. ACM Transactions on Architecture and Code Optimization, 2012, 9(3): 13:1-13:31.
  • 2Vijaykumar T N and Sohi G S. Task selection for a multiscalar processor[C]. Proceedings of the 31st Annual ACM/IEEE International Symposium on Microarchitecture, Dallas, 1998: 81-92.
  • 3Hammond L, Hubbert B A, Siu M, et al.. The stanford hydra cmp[J]. IEEE Micro, 2000, 20(2): 71-84.
  • 4Liu W, Tuck J, Ceze L, et al.. POSH: a TLS compiler that exploits program structure[C]. Proceedings of the 8th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, New York, 2006: 158-167.
  • 5Madriles C, García-Qui?ones C, Sánchez J, et al.. Mitosis: a speculative multithreaded processor based on precomputation slices[J]. IEEE Transactions on Parallel and Distributed Systems, 2008, 19(7): 914-925.
  • 6Jialu H, Jablin T B, Beard S R, et al.. Automatically exploiting cross-invocation parallelism using runtime information[C]. Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, Shenzhen, 2013: 1-11.
  • 7Gao L, Li L, Xue J, et al.. SEED: a statically greedy and dynamically adaptive approach for speculative loop execution[J]. IEEE Transactions on Computers, 2013, 62(5): 1004-1016.
  • 8Sharafeddine M, Jothi K, and Akkary H. Disjoint out-of-order execution processor[J]. ACM Transactions on Architecture and Code Optimization, 2012, 9(3): 19:1-19:32.
  • 9宋少龙,赵银亮,冯博琴,韦远科,王旭昊,赵恒星.支持推测多线程的扩展多核模拟器Prophet+[J].西安交通大学学报,2010,44(10):13-17. 被引量:3
  • 10Wang S Y, Yew P C, and Zhai A. Code transformations for enhancing the performance of speculatively parallel threads[J]. Journal of Circuits, Systems and Computers, 2012, 21(2): 1-23.

二级参考文献9

  • 1BRIAN A,RUDOLF E Application of automatic parallelization to modem challenges of scientific computing industries[C]//Proceedings of the 37th International Conference on Parallel Processing.Piscataway,NJ,USA:IEEE,2008:279-286.
  • 2ARMSTRONG B,EIGENMANN R.Challenges in the automatic parallelization of large-scale computational applications[C]//Proceedings of SPIE/ITCOM 2001.Bellingham,WA,USA:SPIE,2001:50-60.
  • 3TIAN C,FENG M,NAGARAIAN V,et al.Copy or discard execution model for speculative parallelization on muhicores[C].Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture.Piscataway,NJ,USA:IEEE,2008,330-341.
  • 4OHSAWA T,TAKAGI M,KAWAHARA S,et al.Pinot:speculative multi-threading processor architecture exploiting parallelism over a wide range of granularities[C]//Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture.Piscataway,NJ,USA:IEEE,2005,81-92.
  • 5MADRILES C,QUINONES C,SANCHEZ J,et al.Mitosis:a speculative multithreaded processor hased on precomputation slices[J].IEEE Transactions on Parallel and Distributed Systems,2008,19(7):914-925.
  • 6DONG Zhaoyu,ZHAO Yinliang,WEI Yuanke,et al.Prophet:a speculative multi-threading execution model with architectural support based on CMP[C]//Proceedings of the 2009 International Conference on Scalable Computing and Communications.Piscataway,NJ,USA:IEEE,2009,103-108.
  • 7ZIER D,LEE B.Performance evaluation of dynamic speculative multithreading with the cascadia architecture[J].IEEE Transactions on Parallel and Distributed Systems,2010,21(1):47-59.
  • 8SWEETMAN D.See MIPS run[M].San Francisco,CA,USA:Morgan Kaufmann Publishers,2007.
  • 9GAEKE B R.VMIPS Project[EB/OL].[2008-01-24].http://www.dgate.org/VMIPS.

共引文献2

同被引文献56

  • 1Chen T F and Bo:r J L. A performance study of software and hardware data prefetehing schemes[C]. Proceedings of 21st International Symposium on Computer Architecture, Chicago, USA, 1994: 223-232.
  • 2Saavedra R H and Daeyeon P. Improving the effectiveness of software prefetching with adaptive execution[C]. Proceedings of Conference on Parallel Architectures and Compilation Techniques, Boston, USA, 1996: 68-78.
  • 3Hut I and Lin C. Feedback mechanisms for improving probabilistic memory prefetching[C]. Proceedings of 15th International Symposium on High Performance Computer Architecture, North Carolina, USA, 2009: 443-454.
  • 4Dongkeun K, Liao S S W, Wang P H, et al: Physical experimentation with prefetching helper threads on Intel's hyper-thremied processorsIC]. Proceedings of International Symposium on Code Generation and Optimization, California, USA, 2004: 27-38.
  • 5Lu J. Design and implementation of a lightweight runtime optimization system on modern computer architectures[D]. [Ph.D. dissertation], University of Minnesota, 2006.
  • 6Ro W W and Gaudiot J L. Speculative pre-execution assisted by compiler (SPEAR)[J]. Journal of Parallel and Distributed Computing, 2006, 66(8): 1076-1089.
  • 7Somogyi S, Wenisch T F, Ailamaki A, et al: Spatial-temporal memory streaming[C]. Proceedings of thc 36th International Symposium on Computer Architecture, Austin, USA, 2009: 69-80.
  • 8Lee J, Jung C, Lim D, et al: Prefetching with helper threads for loosely coupled multiprocessor systems[J]. IEEE Transactions on Parallel and Distributed Systems, 2009, 20(9): 1309-1324.
  • 9Maxin G, McCurdy C, and Vetter J S. Diagnosis and optimization of application prefetching performance[C]. Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, Oregon, USA,2013:303 312.
  • 10Garside J and Audsley N C. Prefetching across a shared memory tree within a network-on-chip architecture[C]. Proceedings of 15th International Symposium on System-on- Chip, Melbourne, Australia, 2013: 1-4.

引证文献7

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部