期刊文献+

基于数据流块的空间指令调度方法 被引量:5

The Data-Flow Block Based Spatial Instruction Scheduling Method
下载PDF
导出
摘要 分簇超标量处理器将硬件资源分区来避免大的单体部件导致的功耗与周期惩罚,动态多核处理器融合多个物理核的硬件资源提供适应程序需求的计算能力,这些结构合理使用空间分布的硬件资源实现高能效的计算.空间分区结构中指令负载不均衡和跨区操作数传递延迟等问题可导致性能惩罚,需要有效的指令调度方法将计算在分区间进行分布.提出了基于数据流块(data-flow block,DFB)的空间指令调度方法.DFB是动态构建、缓存并重用的一个或数个顺序执行的指令基本块的调度模式.DFB调度算法建模动态指令流中的数据流约束和硬件资源定义的调度空间,然后根据指令量化的相对关键性完成调度决策.介绍了DFB调度的微结构框架和算法.通过对分区数、分区间延迟和调度窗口容量等与调度方法密切相关的微结构参数的实验,证明了DFB调度的性能和稳定性优于负载均衡调度和基于依赖的调度.最后举例证明结合一种数据流块缓存实现的DFB调度达到的调度效果接近理想化的DFB调度. Clustered superscalar processors partition hardware resources to circumvent the energy andcycle time penalties incurred by large,monolithic structures.Dynamic multi-core processors fushardware resources of several physical cores to provide the computation capability adapting toapplications.Energy-efficient computation is achieved in these architectures with a carefullyorchestrated utilization of spatially distributed hardware resources.Problems such as instruction load imbalance and operand forwarding latency between partitions m a y cause performance penalties,so an effective spatial instruction scheduling method is needed to distribute the computation among the partitions of spatial architectures.W e present the data-flow block(DFB)based spatial instruction scheduling method.DFB sare dynamically constructed,cached and reused schedule patterns for one or more sequentially executed instruction basic blocks.D F B scheduling algorithm models the data-flow constraints of dynamic instruction stream and the scheduling space defined by hardware resources,then makes the scheduling decision according to the relative criticality,which is the quantitative scheduling slack of instructions.We present the framework and algorithm related to DFB scheduling.Through experimenting with various microbar chitecture parameters closely related to scheduling method such as partition count,inter-partition latency and schedule window capacity,we prove that ideal DFB scheduling performs better and stabler than round-robin and dependence-based scheduling.A t last,wesh ow that the scheduling performance with a DFB cache implementation example closes to ideal D F B scheduling.
作者 刘炳涛 王达 叶笑春 范东睿 张志敏 唐志敏 Liu Bingtao;Wang Da;Ye Xiaochun;Fan Dongrui;Zhang Zhimin;Tang Zhimin(State Key Laboratory of Computer Architecture ( Institute of Computing Technology,Chinese Academy of Sciences),Beijing 100190;School of Computer and Control Engineering,University of Chinese Academy of Sciencss,Beijing100049;Institute of Information and Control,Hangzhou Dianzi University,Hangzhou 310018)
出处 《计算机研究与发展》 EI CSCD 北大核心 2017年第4期750-763,共14页 Journal of Computer Research and Development
基金 国家重点研发计划项目(2016YFB0200501) 国家自然科学基金项目(61332009 61521092 61671196 61327902) 数学工程与先进计算国家重点实验室开放基金项目(2016A04) 北京市科委科技计划专项项目(Z15010101009)~~
关键词 处理器微结构 负载均衡 旨令调度 数据流 关键路径 processor microarchitecture load balancing instruction scheduling data-flow critical path
  • 相关文献

参考文献1

二级参考文献22

  • 1Agarwal V, Hrishikesh M S, Keckler S W, et ah Clock rate versus IPC: The end of the road for conventional microarchitectures [C] //Proc of the 27th Int Symp on Computer Architecture. New York: ACM, 2000 248-259.
  • 2Borkar S, Dubey P, Kahn K, et al. Platform 2015: Intel processor and platform evolution for the next decade [J/OL]. 2015 03 04. http://www, researchgate, net/publication/ 247190040.
  • 3Hill M D, Marty M R. Amdahl's law in the multicore era [J]. Computer, 2008, 41(7): 33-38.
  • 4Hennessy J, Patterson D. Computer Architecture A Quantitative Approach [M]. San Francisco, CA: Morgan Kaufmann, 2011.
  • 5Lee L H, Moyer B, Arends J. Instruction fetch energy reduction using loop caches for embedded applications with small tight loops [C] //Proc of the 1999 Int Syrup on Low Power Electronics and Design. New York: ACM, 1999: 267-269.
  • 6Singhal R. Inside Intel next generation Nehalem microarchitecture [C/OL] //Proc of the 20th Symp of Hot Chips. [2015- 03-04]. http://www, cs. uml. edu/bill/csS15/ Intel_Nehalem Processor. pdf.
  • 7Rotenberg E, Bennett S, Smith J E. Trace cache: A low latency approach to high bandwidth instruction fetching [C] //Proc of the 29th Int Symp on Microarchitecture. Los Alamitos, CA: IEEE Computer Society, 1996:24-35.
  • 8Lempel O. 2nd generation Intel core processor family: Intel core i7, i5 and i3 [C/OL] //Proc of the 23rd Symp of Hot Chips. [2015-03-04]. http://www, hotchips, org/wp-content/ uploads/hc_archives/hc23/HC23. 19.9-Desktop-CPUs/HC23. I9.911-Sandy-Bridge-Lempel-Intel-Rev 07. pdf.
  • 9Black B, Rychlik B, Shen J P. The block based trace cache [C] //Proc of the 26th Int Syrup on Computer Architecture. Los Alamitos, CA: IEEEComputer Society, 1999:196-207.
  • 10Swanson S, Schwerin A, Mercaldi M, et al. The WaveScalar architecture [J]. ACM Trans on Computer Systems, 2007, 25(2) : article 4.

同被引文献33

引证文献5

二级引证文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部