期刊文献+

动态翻译系统中的间接转移关联软件预测算法 被引量:1

Correlated Software Prediction for Indirect Branch in Dynamic Translation Systems
下载PDF
导出
摘要 动态翻译系统每执行一次间接转移指令均需进行一次地址转换,该过程是翻译系统性能开销的主要来源之一.无特殊硬件支持的翻译系统常采用软件预测法来降低地址转换开销,而软件预测法的预测准确率较低,制约其对翻译系统整体性能的提升.低开销关联软件预测算法(low-overhead correlated software prediction,LOCSP)可利用代码副本区分待预测指令的不同转移场景,将到达该指令的多条动态执行路径分离为多个互不重合的代码缓存副本,并为各个副本提供独立的预测链.从而在不增加动态指令数的前提下实现关联预测,显著提升软件预测的预测准确率.同时,LOCSP算法基于动态剖析的结果,仅对部分难预测的热点间接转移指令进行关联软件预测,进一步降低预测开销.实验表明,相比软件预测法,LOCSP算法可将平均预测准确率从58.9%提升至82.2%,将翻译系统的整体性能开销平均降低19.3%,最高降低41.9%,而平均静态代码数量仅增加2.4%. Dynamic translation system should perform an address translation for each execution of indirect branch instructions, so handling indirect branch becomes a major performance overhead of the system. The translation systems without hardware support always use software prediction to reduce the overhead of address translation, but the low prediction accuracy restricts the performance improvement. This paper analyzes the performance bottleneck of software prediction, and proposes a novel prediction mechanism called low-overhead correlated software prediction (LOCSP), which can significantly improve the prediction accuracy using branch correlation. LOCSP uses code replicas to distinguish the different branch occasions of an indirect branch instruction. By making different control flows execute different code replicas, and deploying individual software prediction chains for each replica, LOCSP realizes correlated prediction without any increase in dynamic instruction count. Meanwhile, LOCSP classifies the indirect branch instructions by dynamic profiling, and only applies correlated prediction on hot and hard-to-predict instructions, further minimizing prediction overhead. The experiment shows that, compared with software prediction, LOCSP can improve the average prediction accuracy from 58.9% to 82.2%, thus reduces the performance overhead by 19.3% on average, up to 41.9%, while only increases static code size by 2.4% on average. Furthermore, LOCSP can cooperate with other optimization techniques of handling indirect branch.
出处 《计算机研究与发展》 EI CSCD 北大核心 2014年第3期661-671,共11页 Journal of Computer Research and Development
基金 "核高基"国家科技重大专项基金项目(2009ZX01029-001-002)
关键词 动态翻译 间接转移 软件预测 代码复制 关联预测 dynamic translation indirect branch software prediction code duplication correlated prediction
  • 相关文献

参考文献21

  • 1Bansal S,Aiken A. Binary translation using peephole superoptimizers[A].Berkeley,CA:USENIX Association,2008.177-192.
  • 2唐锋,武成岗,张兆庆,杨浩.二进制翻译应用级异常处理[J].计算机研究与发展,2006,43(12):2166-2173. 被引量:5
  • 3Luk C-K,Cohn R,Muth R. Pin:Building customized program analysis tools with dynamic instrumentation[A].New York:ACM,2005.190-200.
  • 4Bruening D,Garnett T,Amarasinghe S. An infrastructure for adaptive dynamic optimization[A].Los Alamitos,CA:IEEE Computer Society,2003.265-275.
  • 5赵天磊,唐遇星,付桂涛,贾小敏,齐树波,张民选.利用动态二进制翻译加速应用程序行为特征分析[J].计算机研究与发展,2012,49(1):35-43. 被引量:6
  • 6谢海斌,武成岗,崔慧敏,李晶.二进制翻译中的X86浮点栈处理[J].计算机研究与发展,2007,44(11):1946-1954. 被引量:2
  • 7陈龙,武成岗,谢海斌,崔慧敏,张兆庆.二进制翻译中解析多目标分支语句的图匹配方法[J].计算机研究与发展,2008,45(10):1789-1798. 被引量:5
  • 8Kim H-S,Smith J E. Hardware support for control transfers in code caches[A].Los Alamitos,CA:IEEE Computer Society,2003.253-264.
  • 9Hiser J D,Williams D W,Hu Wei. Evaluating indirect branch handling mechanisms in software dynamic translation systems[J].ACM Trans on Architecture and Code Optimization,2011,(02):1-28.
  • 10Borin E,Wu Youfeng. Characterization of DBT overhead[A].Los Alamitos,CA:IEEE Computer Society,2009.178-187.

二级参考文献66

  • 1唐锋,武成岗,张兆庆,杨浩.二进制翻译应用级异常处理[J].计算机研究与发展,2006,43(12):2166-2173. 被引量:5
  • 2杨浩,唐锋,谢海斌,武成岗,冯晓兵.二进制翻译中的库函数处理[J].计算机研究与发展,2006,43(12):2174-2179. 被引量:9
  • 3李剑慧,马湘宁,朱传琪.动态二进制翻译与优化技术研究[J].计算机研究与发展,2007,44(1):161-168. 被引量:26
  • 4Pusukuri K K, Vengerov D, Fedorova A, et ah Fact: A framework for adaptive contention aware thread migrations [C] //Proc of the 8th ACM Int Conf on Computing Frontiers. New York: ACM, 2011:1-10.
  • 5Herrero E, Gonzaez J, Canal Kasnc copw caching; An autonomous dynamically adaptive memory hierarchy for chip multiprocessors [C]//Proc of the 37th Int Syrup on Computer Architecture (ISCA'10)- SaintMalo. France: ACM SIGARCH, 2010:41,9-428.
  • 6Roldn D, Fraguela B B, Doallo R. Adaptive line placement with the set balancing cache [C]//Proc of the 42nd Annual IEEE/ACM Int Symp on Microarchitecture. New York: ACM, 2009:529-540.
  • 7Hamerly G, Perelman E, Lau J, et al. SimPoint 3.0; Faster and more flexible program phase analysis[J]. Journal of Instruction Level Parallelism, 2005, 7(4): 1-10.
  • 8Sherwood T, Perelman E, Hamerly G, et al. Automatically characterizing large scale program behavior [J].SIGARCH Computer Architecture News, 2002, 30(5): 45-57.
  • 9Yi J J, Kodakara S V, Sendag R, et al. Characterizing and comparing prevailing simulation techniques [C] //Proe of the 1 lth Int Symp on High-Performance Computer Architecture. Los Alarnitos, CA: IEEE Computer Society, 2005:266-277.
  • 10Gschwind M, Altman E, Sathaye S, ct al. Dynamic and transparent binary translation [J].Computer, 2002, 33: 54- 59.

共引文献10

同被引文献15

  • 1李剑慧,马湘宁,朱传琪.动态二进制翻译与优化技术研究[J].计算机研究与发展,2007,44(1):161-168. 被引量:26
  • 2Altman E R, Kaeli D, Sheller Y. Welcome to the op- portunities of binary translation[J]. Computer, 2000, 33(3) ~ 40-45.
  • 3Jia N, Yang C, Wang J, et al. SPIRE: Improving dynamic binary translation through SPC-Indexed indi- rect branch redirecting[C] ,//Proe of the 9th ACM SlGPLAN/SlGOPS Int Conf on Virtual Execution En- vironments. New York: ACM, 2013: 1-12.
  • 4Brankovic A, Stavrou K, Gibert E, et al. Perform- ance analysis and predictability of the software layer in dynamic binary translators/optimizers[C]//Proe of the ACM Int Conf on Computing Frontiers. New York: ACM, 2013: 1-10.
  • 5Bellard F. Qemu, a fast and portable dynamic trans- lator[C]//USENIX Annual Technical Conf. Califor- nia: FREENIX Track, 2005: 41-46.
  • 6Hung D Y, Hsu C C, Yew P C, et al. HQemu: A multi-threaded and retargetable dynamic binary trans- lator on multicores[C]//Proc of the lOth lnt Symp on Code Generation and Optimization. New York: ACM, 2012: 104-113.
  • 7WHU Wei-wu, WANG Jian, GAO Xiang, et al. GODSON-3.. A scalable multicore RISC processor with X86 emulation[J]. IEEE MICRO, 2009, 29(2) : 17-29.
  • 8Hsu C C, Liu P, Wang C M, et al. LnQ: Building high performance dynamic binary translators with ex- isting compiler baekends[C] // Int Conf on Parallel Processing. Piscataway, NJ: IEEE, 2011: 226-234.
  • 9Ottoni G, Hartin T, Weaver C, etal. Harmonia: A transparent, efficient, and harmonious dynamic bina- ry translator targeting the Intel architecture[C]// Proc of the 8th ACM Int Conf on Computing Fron- tiers. New York.. ACM, 2011: 1-10.
  • 10SunTT, Yang Y D, Yang H B, etal. Return in- struction analysis and optimization in dynamic binary translation[C]//The 4th Int Conf on Frontier of Com- puter Science and Technology. Piscataway, NJ.. IEEE, 2009.. 435-440.

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部