cache profiling信息指导的软件流水被引量：1

Software Pipelining with Cache Profiling Information

下载PDF

导出

摘要软件流水是一种重要的指令调度技术,它通过同时执行来自不同循环迭代的指令来加快循环的执行时间.随着处理器速度和访存速度差距越拉越大,访存指令尤其是cache miss的访存指令日益成为系统性能提高的瓶颈.由于这些指令的延迟不是固定的,如何在软件流水中预测并掩盖这些访存指令的延迟是非常重要的.与前人预测访存延迟的方法不同,引入cache profiling技术,通过动态收集到profile信息来预测访存延迟,并进行适当的调度.当增加模调度循环中的访存指令的延迟时,启动间隔也会随之增大,导致性能不会随之上升.CSMS算法和FLMS算法在尽量不增大启动间隔的情况下,改变访存指令的延迟.改进了CSMS算法和FLMS算法,根据cache profiling的信息来改变访存延迟,所以比前人的方法更为准确.实验表明,新方法可以有效地提高程序性能,对SPEC2000测试程序平均性能提高1%左右,个别例子的性能改进高达11%. Software pipelining is an important instruction scheduling technique. It tries to improve the performance of a loop by overlapping the execution of several successive iterations. As the gap between the speed of processor and memory becomes larger and larger, memory access instructions, especially the instructions which cause cache miss, become the bottleneck that restricts high performance. As these instructions＇s latency is not fixed, it is very important to predict and hide the latency of these memory access instructions. Unlike the method used by others, cache profiling technique is introduced, collecting runtime information to predict memory access latency, and to schedule accordingly. When increasing the memory access latency in the software pipelined loop, the initial interval may also increase, thus the performance may not increase. The CSMS and FLMS algorithms are trying to change the memory access latency without increasing the initial interval. The CSMS and FLMS algorithms are improved, changing the memory access latency according to cache profiling information, so it is more accurate than the method used before. Experiment result shows that the new method can improve the performance effectively, increasing performance of SPEC2000 1% on average, some case being as high as 11%.

作者周谦冯晓兵张兆庆

机构地区中国科学院计算机系统结构重点实验室

出处《计算机研究与发展》 EI CSCD 北大核心 2008年第5期834-840,共7页 Journal of Computer Research and Development

基金国家"九七三"重点基础研究发展规划基金项目(2005CB321602)~~

关键词软件流水模调度 CACHE PROFILING 访存延迟高性能计算 software pipelining modulo scheduling cache profiling memory access latency high performance computing

分类号 TP314 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献9

1F J Sanchez,A Gonzalez.Cache sensitive module scheduling[C].Int'l Conf on Parallel Architectures and Compilation Techniques,Newport Beach,USA,1999
2刘利,李文龙,陈彧,李胜梅,汤志忠.软件流水中隐藏存储延迟的方法[J].软件学报,2005,16(10):1833-1841. 被引量：6
3周谦,冯晓兵,张兆庆.Cache Profiling技术[J].计算机工程,2006,32(13):47-48. 被引量：2
4Open Research Compiler for Itanium Processor[OL].http://ipf-orc.sourforge.net,2005
5R A Huff.Lifetime-sensitive modulo scheduling[C].ACM SIGPLAN Conf on Programming Language Design and Implementation,Albuquerque,New Mexico,1993
6B R Rau.Iterative modulo scheduling:An algorithm for software pipelining loops[C].The 27th Int'l Symp on Microarchitecture,San Jose,California,1994
7J Losa,A Gonzalez,E Ayguade,et al.Swing modulo scheduling:A lifetime-sensitive approach[C].Int'l Conf on Parallel Architectures and Compilation Techniques,Boston,USA,1996
8A K Dani,V J Ramanan,R Govindarajan.Register-sensitive software pipelining[C].The Merged 12th Int'l Parallel Processing Symp and 9th Int'l Symp on Parallel and Distributed Systems,Orlando,Florida,1998
9Chen Ding,Steve Carr,Phil Sweany.Modulo scheduling with cache reuse information[C].Euro-par97,Passau,1997

二级参考文献15

1Allan VH, Jones RB, Lee RM, Allan SJ. Software pipelining. ACM Computing Surveys, 1995,27(3):367-432.
2Rau BR. Iterative modulo scheduling: An algorithm for software pipelining loops. In: Proc. of the 27th Annual Int'l Symp. on Microarchitecture. New York: ACM Press, 1994.63-74.
3Callahan D, Kennedy K, Porterfield A. Software prefetching. In: Proc. of the 4th Int'l Conf. on Architectural Support for Programming Languages and Operating Systems. New York: ACM Press, 1991.40-52.
4Ju RDC, Nomura K, Mahadevan U, Wu LC. A unified compiler framework for control and data speculation. In: Hurson AR, ed.Proc. of the 2000 Int'l Conf. on Parallel Architecture and Compilation Techniques. IEEE Press, 2000. 157-168.
5Sanchez FJ, Gonzalez A. Cache sensitive modulo scheduling. In: Proc. of the 30th Annual IEEE/ACM Int'l Symp. on Microarchitecture. IEEE Press, 1997. 338-348.
6Doshi G, Krishnaiyer R, Muthukumar K. Optimizing software data prefetches with rotating registers. In: Hurson AR, ed. Proc. of the 2001 Int'l Conf. on Parallel Architecture and Compilation Techniques. IEEE Press, 2001. 257-267.
7Collard JF, Lavery D. Optimizations to prevent cache penalties for the Intel(R) Itanium(R) 2 processor. In: Int'l Symp. on Code Generation and Optimization. 2003. 105-114.
8Huff RA. Lifetime-Sensitive modulo scheduling. In: Budd TA, ed. Proc. of the ACM SIGPLAN'93 Conf. on Programming Language Design and Implementation. New York: ACM Press, 1993. 258-267.
9Roy J, Sun C, Wu CY. Tutorial: Open research compiler for Itanium processor family (IPF). In: Proc. of the 34th Annual Int'l Symp. on Microarchitecture. New York: ACM Press, 2001.
10Intel Corp. Intel@ Itanium@ 2 Processor Reference Manual For Software Development and Optimization. Intel Corporation, 2004.

共引文献6

1刘利,李文龙,郭振宇,李胜梅,汤志忠.避免模调度中cache代价的优化方法[J].软件学报,2005,16(10):1842-1852. 被引量：1
2刘利,陈彧,乔林,汤志忠.利用循环分割和循环展开避免Cache代价[J].软件学报,2008,19(9):2228-2242. 被引量：2
3董亚卓,窦勇,宋健,刘明政.自动映射多循环程序到有限FPGA资源的参数化流水线模板[J].计算机学报,2009,32(1):152-160. 被引量：2
4谭明星,刘先华,张吉豫,程旭.基于优化回溯模型的无重叠模调度算法[J].电子学报,2012,40(8):1681-1686.
5盛腾飞,卢宏生,曹志强,王梦嘉,斯添浩.高性能计算系统RDMA Read机制研究[J].计算机工程,2018,44(10):69-79. 被引量：1
6刘大兴,顾乃杰,黄章进,苏俊杰,齐东升.一种用于软件预取的访存轨迹采样算法[J].计算机工程,2024,50(10):362-369.

同被引文献10

1颜玉兰,何克清,刘进.一种基于有限状态机的模型转换方法[J].计算机工程,2006,32(1):93-95. 被引量：9
2Rosenblum M, Herrod S, Witchel E, et al. Complete computer simulation : the simOS approach [ J ]. Parallel & Distributed Technology: Systems & Applications, IEEE, 1995,3 (04) :34 - 43.
3The SPARC Architecture Manual Version 8[ S]. SPARC International, 1992.
4Jone L,David AP.计算机系统结构:量化研究方法[M].北京:电子工业出版社,2004.
5胡正伟,仲顺安,陈禾.VelociTI结构浮点DSPs寄存器堆读写的流水线设计[J].计算机工程,2007,33(21):237-239. 被引量：1
6郑德春,姚庆栋,刘鹏,余巧燕.基于软硬件协同仿真平台的功能仿真测试方法[J].电路与系统学报,2008,13(2):135-139. 被引量：6
7邱铁,西方,迟宗正.ARM流水线关键技术分析与代码优化[J].单片机与嵌入式系统应用,2009(3):24-27. 被引量：4
8陈定君,郭晓东,张应辉,余克清,刘积仁.嵌入式软件仿真开发系统的研究[J].电子学报,2000,28(3):137-139. 被引量：15
9王利明,宋振宇,李明,陈渝.一个开放源码的嵌入式仿真环境——SkyEye[J].单片机与嵌入式系统应用,2003(9):14-18. 被引量：12
10张鲁峰,熊志辉,李思昆.基于虚拟微处理器的嵌入式软件开发与系统验证环境[J].计算机研究与发展,2003,40(11):1657-1661. 被引量：4

引证文献1

1西方,周宽久,刘晓艳.基于SPARC V8虚拟仿真测试平台的设计与实现[J].计算机应用与软件,2010,27(11):166-168. 被引量：2

二级引证文献2

1董佳梁,李彦峰,杨秋松,翟健.面向航天系统的嵌入式操作系统实时性评测[J].计算机工程与设计,2013,34(1):114-120. 被引量：4
2李延节,尉爽生,何劲松,李然.基于SPARC V8架构的三机冗余仿真系统的设计[J].航天控制,2013,31(6):71-75. 被引量：1

1吴佩华,郭勇,漆锋滨.模调度与DFA结合的技术及其在gcc上的实现[J].计算机工程与应用,2004,40(31):102-105.
2胡定磊,陈书明,刘春林.基于超块的统一分簇与模调度[J].计算机研究与发展,2007,44(8):1429-1438.
3陈佳.EASY到底,内存速当硬盘使![J].电脑爱好者,2009(23):53-53.
4宋健,葛颖增,窦勇.资源约束的FPGA流水线调度[J].计算机工程,2008,34(15):44-46. 被引量：1
5谢振华,程江涛,耿昌茂,周德云.自适应模糊控制几个基本问题的研究进展[J].电光与控制,2000,7(2):18-25. 被引量：5
6王向前,郑启龙,洪一.分簇结构模调度框架研究[J].中国科学技术大学学报,2016,46(2):104-112. 被引量：3
7陈纪孝,李勇.软件流水循环缓冲的设计与实现[J].计算机科学,2013,40(4):35-37. 被引量：4
8王雷.一种新颖的字模调度算法—SC算法[J].中国计算机用户,1990(8):7-8.
9刘家兵,徐云.X86平台上Open64软件流水的设计与实现[J].计算机工程,2013,39(9):15-19. 被引量：2
10方志红,常越.TMS320C6X的SPLOOP技术[J].雷达科学与技术,2014,12(4):437-440.

计算机研究与发展

2008年第5期

浏览历史

内容加载中请稍等...

cache profiling信息指导的软件流水被引量：1

参考文献9

二级参考文献15

共引文献6

同被引文献10

引证文献1

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

cache profiling信息指导的软件流水 被引量：1

参考文献9

二级参考文献15

共引文献6

同被引文献10

引证文献1

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

cache profiling信息指导的软件流水被引量：1