期刊文献+

cache profiling信息指导的软件流水 被引量:1

Software Pipelining with Cache Profiling Information
下载PDF
导出
摘要 软件流水是一种重要的指令调度技术,它通过同时执行来自不同循环迭代的指令来加快循环的执行时间.随着处理器速度和访存速度差距越拉越大,访存指令尤其是cache miss的访存指令日益成为系统性能提高的瓶颈.由于这些指令的延迟不是固定的,如何在软件流水中预测并掩盖这些访存指令的延迟是非常重要的.与前人预测访存延迟的方法不同,引入cache profiling技术,通过动态收集到profile信息来预测访存延迟,并进行适当的调度.当增加模调度循环中的访存指令的延迟时,启动间隔也会随之增大,导致性能不会随之上升.CSMS算法和FLMS算法在尽量不增大启动间隔的情况下,改变访存指令的延迟.改进了CSMS算法和FLMS算法,根据cache profiling的信息来改变访存延迟,所以比前人的方法更为准确.实验表明,新方法可以有效地提高程序性能,对SPEC2000测试程序平均性能提高1%左右,个别例子的性能改进高达11%. Software pipelining is an important instruction scheduling technique. It tries to improve the performance of a loop by overlapping the execution of several successive iterations. As the gap between the speed of processor and memory becomes larger and larger, memory access instructions, especially the instructions which cause cache miss, become the bottleneck that restricts high performance. As these instructions's latency is not fixed, it is very important to predict and hide the latency of these memory access instructions. Unlike the method used by others, cache profiling technique is introduced, collecting runtime information to predict memory access latency, and to schedule accordingly. When increasing the memory access latency in the software pipelined loop, the initial interval may also increase, thus the performance may not increase. The CSMS and FLMS algorithms are trying to change the memory access latency without increasing the initial interval. The CSMS and FLMS algorithms are improved, changing the memory access latency according to cache profiling information, so it is more accurate than the method used before. Experiment result shows that the new method can improve the performance effectively, increasing performance of SPEC2000 1% on average, some case being as high as 11%.
出处 《计算机研究与发展》 EI CSCD 北大核心 2008年第5期834-840,共7页 Journal of Computer Research and Development
基金 国家"九七三"重点基础研究发展规划基金项目(2005CB321602)~~
关键词 软件流水 模调度 CACHE PROFILING 访存延迟 高性能计算 software pipelining modulo scheduling cache profiling memory access latency high performance computing
  • 相关文献

参考文献9

  • 1F J Sanchez,A Gonzalez.Cache sensitive module scheduling[C].Int'l Conf on Parallel Architectures and Compilation Techniques,Newport Beach,USA,1999
  • 2刘利,李文龙,陈彧,李胜梅,汤志忠.软件流水中隐藏存储延迟的方法[J].软件学报,2005,16(10):1833-1841. 被引量:6
  • 3周谦,冯晓兵,张兆庆.Cache Profiling技术[J].计算机工程,2006,32(13):47-48. 被引量:2
  • 4Open Research Compiler for Itanium Processor[OL].http://ipf-orc.sourforge.net,2005
  • 5R A Huff.Lifetime-sensitive modulo scheduling[C].ACM SIGPLAN Conf on Programming Language Design and Implementation,Albuquerque,New Mexico,1993
  • 6B R Rau.Iterative modulo scheduling:An algorithm for software pipelining loops[C].The 27th Int'l Symp on Microarchitecture,San Jose,California,1994
  • 7J Losa,A Gonzalez,E Ayguade,et al.Swing modulo scheduling:A lifetime-sensitive approach[C].Int'l Conf on Parallel Architectures and Compilation Techniques,Boston,USA,1996
  • 8A K Dani,V J Ramanan,R Govindarajan.Register-sensitive software pipelining[C].The Merged 12th Int'l Parallel Processing Symp and 9th Int'l Symp on Parallel and Distributed Systems,Orlando,Florida,1998
  • 9Chen Ding,Steve Carr,Phil Sweany.Modulo scheduling with cache reuse information[C].Euro-par97,Passau,1997

二级参考文献15

  • 1Allan VH, Jones RB, Lee RM, Allan SJ. Software pipelining. ACM Computing Surveys, 1995,27(3):367-432.
  • 2Rau BR. Iterative modulo scheduling: An algorithm for software pipelining loops. In: Proc. of the 27th Annual Int'l Symp. on Microarchitecture. New York: ACM Press, 1994.63-74.
  • 3Callahan D, Kennedy K, Porterfield A. Software prefetching. In: Proc. of the 4th Int'l Conf. on Architectural Support for Programming Languages and Operating Systems. New York: ACM Press, 1991.40-52.
  • 4Ju RDC, Nomura K, Mahadevan U, Wu LC. A unified compiler framework for control and data speculation. In: Hurson AR, ed.Proc. of the 2000 Int'l Conf. on Parallel Architecture and Compilation Techniques. IEEE Press, 2000. 157-168.
  • 5Sanchez FJ, Gonzalez A. Cache sensitive modulo scheduling. In: Proc. of the 30th Annual IEEE/ACM Int'l Symp. on Microarchitecture. IEEE Press, 1997. 338-348.
  • 6Doshi G, Krishnaiyer R, Muthukumar K. Optimizing software data prefetches with rotating registers. In: Hurson AR, ed. Proc. of the 2001 Int'l Conf. on Parallel Architecture and Compilation Techniques. IEEE Press, 2001. 257-267.
  • 7Collard JF, Lavery D. Optimizations to prevent cache penalties for the Intel(R) Itanium(R) 2 processor. In: Int'l Symp. on Code Generation and Optimization. 2003. 105-114.
  • 8Huff RA. Lifetime-Sensitive modulo scheduling. In: Budd TA, ed. Proc. of the ACM SIGPLAN'93 Conf. on Programming Language Design and Implementation. New York: ACM Press, 1993. 258-267.
  • 9Roy J, Sun C, Wu CY. Tutorial: Open research compiler for Itanium processor family (IPF). In: Proc. of the 34th Annual Int'l Symp. on Microarchitecture. New York: ACM Press, 2001.
  • 10Intel Corp. Intel@ Itanium@ 2 Processor Reference Manual For Software Development and Optimization. Intel Corporation, 2004.

共引文献6

同被引文献10

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部